Alice's Adventures in the Markovian World

Haoran Sun; Zhanzhan Zhao

arxiv: 1907.08981 · v1 · pith:PDOSKWWNnew · submitted 2019-07-21 · 📡 eess.SY · cs.LG· cs.RO· cs.SY

Alice's Adventures in the Markovian World

Zhanzhan Zhao , Haoran Sun This is my paper

Pith reviewed 2026-05-24 18:34 UTC · model grok-4.3

classification 📡 eess.SY cs.LGcs.ROcs.SY

keywords online learningFollow-the-LeaderGauss-Markov processesno-regret algorithmsLyapunov methodsmodel predictive controlonline optimization

0 comments

The pith

The Alice algorithm generalizes Follow-the-Leader to linear Gauss-Markov processes and proves no-regret performance without noise.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Alice, an online algorithm that makes decisions in an environment following a linear Gauss-Markov process with stochastic noise. It extends Follow-the-Leader by adding a momentum-style regularization term and a Lyapunov-inspired online constraint, without estimating parameters or value functions. The method needs only the state-action alignment relation and exact state observations. A no-regret guarantee holds exactly when noise is absent, and a sufficient condition supports the guarantee when noise is present. The approach is positioned as a mirror optimization to model predictive control.

Core claim

Alice generalizes Follow-the-Leader into the linear Gauss-Markov setting with a regularization term similar to momentum and a feasible online constraint from Lyapunov's Second Theorem, delivering a no-regret proof without state noise and a sufficient condition for the stochastic-noise case, while requiring only state-action alignment knowledge and exact state observations.

What carries the argument

The Alice algorithm, which performs mirror optimization to model predictive control by extending Follow-the-Leader with momentum-like regularization and a Lyapunov-derived constraint.

If this is right

No-regret holds exactly in the noise-free linear Gauss-Markov case.
A sufficient condition on the noise allows the no-regret property to extend to the general stochastic case.
The algorithm operates without parameter estimation, value-function estimation, or an initial stable policy.
Simulations confirm performance comparable to or better than recent alternatives while showing flexibility.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same regularization-plus-constraint pattern could be tested in other online convex optimization settings beyond linear dynamics.
If the sufficient noise condition can be relaxed, the method might apply directly to mildly nonlinear systems without model identification.
The Lyapunov constraint may offer a template for enforcing stability in other no-regret algorithms that lack explicit dynamics.

Load-bearing premise

The environment is exactly linear with stochastic noise, the learner knows only the state-action alignment relationship, and observes every state exactly.

What would settle it

An experiment on a linear system where the sufficient condition on noise is violated and cumulative regret grows linearly with time.

read the original abstract

This paper proposes an algorithm Alice having no access to the physics law of the environment, which is actually linear with stochastic noise, and learns to make decisions directly online without a training phase or a stable policy as initial input. Neither estimating the system parameters nor the value functions online, the proposed algorithm generalizes one of the most fundamental online learning algorithms Follow-the-Leader into a linear Gauss-Markov process setting, with a regularization term similar to the momentum method in the gradient descent algorithm, and a feasible online constraint inspired by Lyapunov's Second Theorem. The proposed algorithm is considered as a mirror optimization to the model predictive control. Only knowing the state-action alignment relationship, with the ability to observe every state exactly, a no-regret proof of the algorithm without state noise is given. The analysis of the general linear system with stochastic noise is shown with a sufficient condition for the no-regret proof. The simulations compare the performance of Alice with another recent work and verify the great flexibility of Alice.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes the Alice algorithm for online decision making in linear Gauss-Markov environments subject to stochastic noise. Alice extends Follow-the-Leader by adding a momentum-style regularization term and a Lyapunov-inspired online feasibility constraint; it requires only knowledge of the state-action alignment and exact state observations, performs no parameter or value-function estimation, and claims a no-regret guarantee for the deterministic (noise-free) case together with a sufficient condition that extends the guarantee to the stochastic-noise case. Simulations compare Alice against a recent baseline.

Significance. If the no-regret claims hold under a mild and explicitly verifiable sufficient condition, the work would offer a parameter-free online method for linear dynamical systems that avoids both system identification and the need for an initial stabilizing policy. The combination of Follow-the-Leader, momentum regularization, and Lyapunov-style constraints is a distinctive technical contribution to online control.

major comments (2)

[Abstract and stochastic-noise analysis section] Abstract (final paragraph) and the section presenting the stochastic-noise analysis: the sufficient condition required for the no-regret guarantee under stochastic noise is referenced but never stated explicitly (no theorem, no inequality, no set of assumptions on noise moments or stability margins). Because the central claim for the general case rests on this condition, its absence prevents verification of whether the condition is mild or implicitly reintroduces observability or bounded-noise assumptions the learner is forbidden to use.
[Noise-free proof section] The noise-free no-regret proof (asserted in the abstract) is described at a high level but supplies neither the explicit regret bound, the induction steps, nor the precise role of the Lyapunov constraint; without these details the reader cannot confirm that the proof does not reduce to a fitted quantity or rely on hidden linearity assumptions beyond those stated.

minor comments (2)

[Simulations] The simulation section should report the precise regret metric, the horizon lengths, and the noise variances used so that the performance comparison can be reproduced.
[Preliminaries] Notation for the state-action alignment relationship and the feasible set should be introduced once with a single consistent symbol rather than re-defined inline.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below and plan to revise the manuscript to incorporate the suggested clarifications.

read point-by-point responses

Referee: [Abstract and stochastic-noise analysis section] Abstract (final paragraph) and the section presenting the stochastic-noise analysis: the sufficient condition required for the no-regret guarantee under stochastic noise is referenced but never stated explicitly (no theorem, no inequality, no set of assumptions on noise moments or stability margins). Because the central claim for the general case rests on this condition, its absence prevents verification of whether the condition is mild or implicitly reintroduces observability or bounded-noise assumptions the learner is forbidden to use.

Authors: We agree that the sufficient condition is referenced in the abstract and analysis section but is not formulated explicitly as a theorem or set of inequalities. This is a presentation issue in the current manuscript. In the revised version, we will introduce an explicit theorem stating the sufficient condition, including the required assumptions on noise moments and stability margins derived from the Lyapunov constraint. The condition is designed to be mild and verifiable using only the known state-action alignment and does not reintroduce observability or bounded noise assumptions beyond what is stated. revision: yes
Referee: [Noise-free proof section] The noise-free no-regret proof (asserted in the abstract) is described at a high level but supplies neither the explicit regret bound, the induction steps, nor the precise role of the Lyapunov constraint; without these details the reader cannot confirm that the proof does not reduce to a fitted quantity or rely on hidden linearity assumptions beyond those stated.

Authors: The noise-free proof is presented at a high level to highlight the generalization of Follow-the-Leader and the role of the momentum regularization and Lyapunov constraint. We acknowledge that additional details would aid verification. In the revision, we will expand this section to provide the explicit regret bound, the key induction steps, and a precise description of how the Lyapunov constraint ensures the no-regret property using only the given assumptions of full state observation and known state-action alignment, without hidden assumptions or reducing to fitted quantities. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The abstract and description present Alice as a generalization of Follow-the-Leader with added regularization and Lyapunov-inspired constraint, supplying an explicit no-regret proof for the noise-free linear Gauss-Markov case and only a sufficient condition for the stochastic-noise case. No equations or steps are shown that reduce a claimed prediction or theorem to a fitted parameter, self-definition, or unverified self-citation by construction. The linearity assumption defines the problem setting rather than being derived from the algorithm's output, and the proof is described as independent analysis. This is the normal case of a self-contained derivation against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that the environment is linear with additive stochastic noise and that the learner has exact state observations plus state-action alignment knowledge. No free parameters or invented entities are declared in the abstract.

axioms (2)

domain assumption The environment is a linear Gauss-Markov process with stochastic noise.
Stated as the problem setting in the abstract.
domain assumption Learner knows only the state-action alignment relationship and observes every state exactly.
Required for the algorithm and the no-regret statement.

pith-pipeline@v0.9.0 · 5702 in / 1244 out tokens · 20009 ms · 2026-05-24T18:34:12.602092+00:00 · methodology

Alice's Adventures in the Markovian World

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)