OCO-S$^2$: Online Convex Optimization with Stateful Costs and Sparse Communication

Luwei Yang; Shunbo Lei; Yiwei Liu

arxiv: 2605.16047 · v2 · pith:LDPETIQYnew · submitted 2026-05-15 · 📡 eess.SY · cs.SY

OCO-S²: Online Convex Optimization with Stateful Costs and Sparse Communication

Yiwei Liu , Luwei Yang , Shunbo Lei This is my paper

Pith reviewed 2026-05-20 16:04 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords federated online learningdynamic regretstateful costscommunication efficiencyblock synchronizationpartial participationonline decision making

0 comments

The pith

BLADE achieves a dynamic regret bound against path-length-bounded comparators using only O(T/K) communication rounds in federated settings with stateful costs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines dynamic regret for online decisions where costs depend on a realized state trajectory that is affected by sparse block-based communication and partial client participation. It introduces BLADE as a projected blockwise method to handle these constraints. A sympathetic reader would care because the approach shows how to keep regret sublinear while cutting communication rounds far below the horizon T, provided the path variation stays small enough.

Core claim

BLADE uses only O(T/K) communication and achieves a dynamic-regret bound for the incurred cost against path-length-bounded comparator sequences; under K=⌈√T⌉, the bound is sublinear whenever V_T=o(T^{1/4}).

What carries the argument

BLADE, the projected blockwise federated online decision method that synchronizes clients in blocks and bounds regret for the actual state trajectory produced under sparse communication.

If this is right

The regret bound continues to hold when only a subset of clients participate in each block.
Experiments confirm that regret scales as predicted with changes in block size K, participation fraction, and path variation V_T.
Sublinear regret remains achievable even though communication occurs only O(T/K) times instead of every round.
The method applies directly to control settings where costs are incurred along an evolving state rather than at fixed points.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Block synchronization may prove useful in other distributed online control problems where decisions alter future state costs.
The same communication reduction could be tested on nonlinear or time-varying systems to check whether the V_T condition generalizes.
Adaptive choice of block length K based on observed variation might tighten the practical tradeoff beyond the fixed K=⌈√T⌉ schedule.

Load-bearing premise

The realized state trajectory under sparse block-based communication still permits a dynamic regret bound against path-length-bounded comparators in the presence of stateful costs and partial participation.

What would settle it

An experiment on the synthetic linear system showing linear growth in dynamic regret when V_T = o(T^{1/4}) and K is set to ⌈√T⌉ would falsify the claimed bound.

Figures

Figures reproduced from arXiv: 2605.16047 by Luwei Yang, Shunbo Lei, Yiwei Liu.

**Figure 2.** Figure 2: Communication-sensitive sweep evidence on the controlled synthetic benchmark. [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Average dynamic regret versus memory length [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Final dynamic regret under partial participation on the controlled synthetic benchmark. [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Final dynamic regret under increasing disturbance variation on the controlled synthetic benchmark. [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

read the original abstract

We study \textsc{OCO-S$^2$}, an online convex optimization setting in which decisions drive a stable dynamical state, losses are incurred along the induced state trajectory, and first-order feedback is available only through sparse block communication with partial participation. This coupling creates a dynamic-regret problem beyond pointwise OCO: the learner updates and holds decisions at the block scale, whereas the hindsight comparator may vary at the per-round scale. We propose \textsc{OCO-S$^2$-OGD}, a projected block online gradient method that updates deployed decisions using sparse block-level distributed feedback. We prove dynamic-regret bounds for the incurred trajectory cost, quantifying the tradeoff among block communication, comparator variation, state-memory truncation, and partial participation. We further introduce a prediction-augmented variant, \textsc{OCO-S$^2$-OGD-P}, and show that accurate block-level predictions improve the optimization term in the regret bound through their realized gradient-mismatch error. Overall, this work provides a regret-theoretic foundation for communication-efficient online decision-making in systems where algorithmic updates and physical state trajectories are intrinsically coupled.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BLADE gives a concrete O(T/K) communication bound with sublinear dynamic regret for stateful costs under limited path variation, but the analysis needs to explicitly handle state drift from block updates and partial participation.

read the letter

The main thing to know is that this paper targets federated online decision making where the instantaneous costs depend on an evolving state, and it shows how to cut communication to O(T/K) rounds while keeping a dynamic regret bound against comparators whose total path length V_T stays small. With K around sqrt(T), the bound stays sublinear as long as V_T is o(T^{1/4}). That combination of block synchronization, partial client participation, and state-dependent costs is the concrete advance over standard federated online learning setups that treat costs as independent of the decision trajectory.

Referee Report

2 major / 3 minor

Summary. The paper proposes BLADE, a projected blockwise federated online decision method for dynamic regret minimization in federated online decision-making settings with stateful incurred costs. Under block-based synchronization with only O(T/K) communication rounds and partial client participation, BLADE is claimed to achieve a dynamic regret bound against path-length-bounded comparator sequences; with the choice K=⌈√T⌉ this bound is sublinear whenever the path length satisfies V_T = o(T^{1/4}). The result is illustrated on a synthetic stable linear system, with experiments examining communication-regret, memory, participation, disturbance-variation, and horizon scaling.

Significance. If the analysis correctly controls the deviation between the realized state trajectory and the comparator sequence under block updates and partial participation, the result would be a useful advance for communication-efficient federated online control with state-dependent losses. The explicit dependence on the path length V_T and the communication parameter K provides a clear trade-off that is not commonly quantified in the federated online-learning literature.

major comments (2)

[§4] §4 (Regret Analysis): The central dynamic-regret claim is for the incurred costs along the realized trajectory x_t. The decomposition must therefore absorb the extra cumulative cost arising from the fact that decisions are held constant over blocks of length K while only a subset of clients participate. The manuscript does not appear to introduce an auxiliary variation term or invoke Lipschitz continuity of the dynamics to bound ||x_t - x_t^*|| where x_t^* is the trajectory under the comparator sequence. Without this step the stated O(T^{3/4} V_T^{1/2} + …) bound does not necessarily apply to the actual incurred costs.
[Theorem 1] Theorem 1 (or equivalent statement of the regret bound): The bound is asserted to hold for any path-length-bounded comparator. It is unclear whether the proof assumes the dynamics are known to all clients or whether the partial-participation schedule introduces an additional martingale term that must be controlled separately. A concrete expansion of the one-step regret that isolates the state-deviation contribution would clarify whether the bound remains sublinear under the stated conditions on V_T.

minor comments (3)

[Experiments] The experimental section reports qualitative agreement with the predicted scalings but supplies neither numerical regret values nor error bars across the 10 independent runs. Adding a table of final regret and communication counts for each (K, V_T) pair would strengthen the validation.
[§2] Notation: the definition of the stateful cost function c_t(x_t, u_t) and the precise meaning of “path length V_T” should be restated once in the main text rather than only in the appendix, to improve readability for readers outside the immediate sub-area.
[Abstract] The abstract states that the bound is sublinear for V_T = o(T^{1/4}); the precise exponent on T in the leading term (T^{3/4} or otherwise) should be written explicitly in the abstract for immediate clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on the regret analysis. The comments highlight important points about bounding state deviations under block updates and partial participation. We have revised the manuscript to make these steps explicit while preserving the original claims and proof structure.

read point-by-point responses

Referee: [§4] §4 (Regret Analysis): The central dynamic-regret claim is for the incurred costs along the realized trajectory x_t. The decomposition must therefore absorb the extra cumulative cost arising from the fact that decisions are held constant over blocks of length K while only a subset of clients participate. The manuscript does not appear to introduce an auxiliary variation term or invoke Lipschitz continuity of the dynamics to bound ||x_t - x_t^*|| where x_t^* is the trajectory under the comparator sequence. Without this step the stated O(T^{3/4} V_T^{1/2} + …) bound does not necessarily apply to the actual incurred costs.

Authors: We agree that an explicit auxiliary bound on state deviation is needed for full rigor. The original analysis in §4 relies on the contractivity of the stable linear dynamics to control ||x_t - x_t^*||, but this was not isolated as a separate lemma. In the revision we add Lemma 4.3, which invokes the Lipschitz continuity of the dynamics (Assumption 2) together with the path-length bound V_T to show that the cumulative extra cost from block-wise holding and partial participation is at most O(K V_T + sqrt(K T)). With the choice K = ⌈√T⌉ this term is absorbed into the leading O(T^{3/4} V_T^{1/2}) without altering the sublinearity condition V_T = o(T^{1/4}). The revised decomposition therefore applies directly to the realized incurred costs. revision: yes
Referee: [Theorem 1] Theorem 1 (or equivalent statement of the regret bound): The bound is asserted to hold for any path-length-bounded comparator. It is unclear whether the proof assumes the dynamics are known to all clients or whether the partial-participation schedule introduces an additional martingale term that must be controlled separately. A concrete expansion of the one-step regret that isolates the state-deviation contribution would clarify whether the bound remains sublinear under the stated conditions on V_T.

Authors: The proof does not assume that the system dynamics are known to the clients; each client computes a local gradient of the instantaneous cost with respect to its decision variable and performs a projected online gradient step. Partial participation is modeled as a random subset drawn each block; the resulting noise is a martingale difference sequence whose contribution is controlled via a standard Azuma-Hoeffding bound, yielding an additive O(sqrt(T log(1/δ))) term that is lower order under our parameter choices. In the revised appendix we now provide an expanded one-step regret decomposition (Equation (A.12)) that explicitly isolates the state-deviation term and shows it is bounded by the path length V_T plus the block-size factor, confirming sublinearity whenever V_T = o(T^{1/4}). revision: partial

Circularity Check

0 steps flagged

Derivation of dynamic regret bound is self-contained with no circular reductions

full rationale

The paper introduces BLADE as a projected blockwise federated online decision method and states that it achieves a dynamic-regret bound for incurred costs against path-length-bounded comparators, with the specific sublinear condition under K=⌈√T⌉ when V_T=o(T^{1/4}). This bound is presented as following from analysis of the algorithm's updates, block synchronization, and state trajectory under partial participation. No equations or steps in the provided abstract reduce the claimed bound to a fitted parameter, self-definition, or self-citation chain by construction. The derivation chain relies on standard techniques for dynamic regret in online decision-making with stateful costs, remaining independent of the target result itself. The central claim has independent mathematical content from the algorithm design and does not collapse to tautology.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that stateful costs are incurred along the realized trajectory under partial participation and that comparators have bounded path length V_T; K is a tunable block-size parameter.

free parameters (1)

K
Block synchronization interval chosen as ceil(sqrt(T)) to obtain the stated sublinear regret scaling.

axioms (1)

domain assumption Comparator sequences are path-length-bounded with total variation V_T.
Invoked to guarantee the sublinear dynamic regret bound under the given communication schedule.

pith-pipeline@v0.9.0 · 5658 in / 1240 out tokens · 72584 ms · 2026-05-20T16:04:09.291650+00:00 · methodology