pith. machine review for the scientific record. sign in

math.OC

Optimization and Control

Operations research, linear programming, control theory, systems theory, optimal control, game theory

0
math.OC 2026-05-13 2 theorems

Alternation cuts cost in multi-objective stochastic optimization

Stochastic block coordinate and function alternation for multi-objective optimization and learning

Cycling through objectives and variable blocks matches single-objective convergence rates while lowering per-iteration cost.

Figure from the paper full image
abstract click to expand
Multi-objective optimization is central to many engineering and machine learning applications, where multiple objectives must be optimized in balance. While multi-gradient based optimization methods combine these objectives in each step, such methods require computing gradients with respect to all variables at every iteration, resulting in high computational costs in large-scale settings. In this work, we propose a framework that simultaneously alternates the optimization of each objective and the (stochastic) gradient update with respect to each variable block. Our framework reduces per-iteration computational cost while enabling exploration of the Pareto front by allocating a prescribed number of gradient steps to each objective. We establish rigorous convergence guarantees across several stochastic smooth settings, including convex, non-convex, and Polyak-Lojasiewicz conditions, recovering classical convergence rates of single-objective methods. Numerical experiments demonstrate that our framework outperforms non-alternating methods on multi-target regression and produces a competitive Pareto front approximation, highlighting its computational efficiency and practical effectiveness.
0
0
math.OC 2026-05-13 2 theorems

Mixed scores in diffusion reduce to geometric potential

Geometric Asymptotics of Score Mixing and Guidance in Diffusion Models

Small-time dynamics governed by weighted squared distances to data supports, for both mixture and amplified guidance

Figure from the paper full image
abstract click to expand
Diffusion models are routinely guided in practice by combining multiple score fields, yet the mathematical structure of score mixing is still poorly understood. We study the small-time generation dynamics driven by mixed scores $$ s=\lambda\,\nabla\log u_1+(1-\lambda)\,\nabla\log u_2,\qquad \lambda\ge 0, $$ in the heat-flow framework, where $u_1,u_2$ are heat evolutions of two compactly supported probability measures. This single formulation covers both the mixture-of-experts regime $(0\leq \lambda\leq 1)$ and the classifier-free guidance regime $(\lambda>1)$. Exploiting a Laplace-Varadhan principle under a similarity-time rescaling, we show that the small-time generation dynamics is governed by the explicit geometric potential $$ \Phi_\lambda=\lambda d_1^2+(1-\lambda)d_2^2, $$ which depends only on the supports of the initial measures and on the mixing parameter. This gives a rigorous reduction from a singular, non-autonomous score-driven dynamics to autonomous Clarke-type subgradient inclusions. In the empirical setting of finite Dirac mixtures, the limiting potential is piecewise quadratic with a Voronoi-type structure; this rigidity yields convergence of all autonomous limiting trajectories to critical points and a conditional convergence criterion for the original generation flow toward local minimizers of the potential, with rate $\mathcal O(\sqrt t)$ in the smooth stable case.
0
0
math.OC 2026-05-13 2 theorems

Dynamical systems converge strongly to min-norm saddle solutions

Convergence Analysis of Hessian-Damped Tikhonov Regularized Dynamics with Oscillation Control for Convex-Concave Bilinear Saddle Point Problems

Hessian-driven damping in second-order primal-dual dynamics reduces oscillations while guaranteeing strong convergence under time-varying Tâ

abstract click to expand
In this paper, we propose a class of general second-order primal-dual dynamical systems with Tikhonov regularization and Hessian-driven damping for solving convex-concave bilinear saddle point problems. The proposed dynamical system incorporates five general time-varying terms: viscous damping, time scaling, extrapolation, Tikhonov regularization, and Hessian-driven damping parameters. Under suitable parametric conditions, we analyze the asymptotic convergence properties of the dynamical system by constructing appropriate Lyapunov functions. Specifically, we obtain the convergence rate of the primal-dual gap and the boundedness of trajectories in the proposed dynamical system, and provide some integral estimates. Furthermore, we theoretically prove that the trajectories generated by the dynamical system converge strongly to the minimum-norm solution of the saddle point problem, and fully demonstrate that Hessian-driven damping can effectively alleviate oscillations. Finally, numerical experiments are conducted to verify the validity of the above theoretical results.
0
0
math.OC 2026-05-13 2 theorems

SDP hierarchy solves quaternion polynomial optimization to global optimum

A Moment-QSOS Hierarchy for a Class of Quaternion Polynomial Optimization Problems

Moment-QSOS relaxations formulated directly in quaternions give monotonic lower bounds and scale via sparsity.

abstract click to expand
This paper introduces a Moment-Quaternion-Sum-of-Squares (QSOS) hierarchy for solving a class of quaternion polynomial optimization problems. This hierarchy is formulated directly in the quaternion domain and consists of a sequence of semidefinite programming (SDP) relaxations that provide monotonic lower bounds on the optimal value. To improve scalability, we incorporate correlative sparsity, which can significantly reduce the size of the resulting SDPs for large-scale sparse problems. Furthermore, we introduce a strengthened QSOS relaxation, which enhances the tightness of the standard relaxation by enlarging the monomial basis in a controlled manner. Our various Numerical experiments show that our approach provides comparable bounds to existing approaches, while significantly reducing computation time and memory usage. In particular, applications to the quaternion-based maximum margin criterion problem and the classical orientation synchronization problem illustrate the practical effectiveness of the framework.
0
0
math.OC 2026-05-13 2 theorems

Self-exciting SDE control obeys a stochastic maximum principle

Stochastic control with self-exciting processes

Sufficient and necessary conditions via martingales replace dynamic programming for non-Markovian problems with event feedback.

Figure from the paper full image
abstract click to expand
We analyze the problem of stochastic optimal control of SDEs where the driver includes a self-exciting stochastic process. Due to the non-Markovian nature of the problem, we apply the stochastic maximum principle approach. We derive a sufficient stochastic maximum principle under this framework. We also derive an expression via martingales of both the self-exciting process and its quadratic covariation. Furthermore, we derive a necessary maximum (equivalence principle) for the self-exciting stochastic control problem. Finally, we look at an application to log-utility.
0
0
math.OC 2026-05-13 2 theorems

Funnel control guarantees error bounds for drill bit velocity

Analysis and funnel control for nonlinear drill strings

The design adapts the reference for wave delays in the nonlinear PDE-ODE model, keeping tracking error inside a pre-set funnel as shown in 1

abstract click to expand
We study the output tracking problem for a vertically driven drill string system described by a nonlinear boundary-coupled PDE-ODE model. Solvability analysis of the drill string model is achieved by first casting the model in an abstract boundary value problem involving set-valued operators on an appropriate Hilbert space. The governing equation here consists of evolution and the damping part. Existence of solutions is established within the framework of maximal monotone operators where one first proves that the evolution operator is a linear skew-adjoint operator and the distributed damping term is a Nemytskii relation which is then proven to be maximal monotone. Maximal monotonicity of the combined operator is then a consequence of Rockafellar's theorem. Furthermore, we propose a novel funnel control design that ensures the angular velocity of the drill bit follows a dynamically adjusted reference trajectory, while the tracking error remains confined within a pre-specified performance funnel. The reference adjustment mechanism adapts in response to large wave traveling times that may cause performance degradation. The corresponding feasibility result is illustrated by some simulations.
0
0
math.OC 2026-05-13 2 theorems

Diversified routes nullify correlation value in energy resilience

Securing the Flow: Maritime Energy Resilience under Correlated and Decision-Dependent Disruptions

Indian maritime imports show the same hedging works for joint or single chokepoint disruptions across stress tests.

Figure from the paper full image
abstract click to expand
We develop a two-stage stochastic multi-commodity flow model to design a resilient maritime energy supply network under correlated chokepoint disruptions. A planner selects strategic inventories and infrastructure activations prior to uncertainty resolution, then routes crude oil, LNG, LPG, and fertilizer through a capacitated network. Three features distinguish this model: disruption scenarios are \emph{correlated}, reflecting the reality that proximate chokepoints (e.g., Hormuz, Bab el-Mandeb) fail jointly; scenario probabilities depend endogenously on first-stage decisions via affine distortion, formalizing \emph{risk exposure through utilization}; and a mean-CVaR objective mitigates tail-risk shortages. Methodologically, the decision-dependent probability model admits an exact MILP reformulation via McCormick linearization. CVaR preserves scenario-wise decomposability, and our Benders decomposition with corridor-based group-failure cuts converges finitely. The model is calibrated to Indian maritime energy imports (16 nodes, 28 arcs) using EIA, UNCTAD, World Bank, and operational data from the 2026 Hormuz crisis. Benders recovers the extensive-form optimum for scenario sizes up to $|S|=729$ with a constant iteration count (10-11). Empirically, the value of the stochastic solution (VSS) is 14.8%; the value of decision-dependent probabilities (VEP) ranges from 0.93% to 8.18%. The mean-CVaR frontier exhibits a design phase transition at confidence level $\alpha\approx 0.75$. Notably, the value of modeling correlation is identically zero across stress tests: the network's diversified portfolio absorbs joint-corridor disruptions using the same hedging mechanisms as single-corridor disruptions (\emph{structural joint-failure resilience}). Finally, LPG emerges as the most exposed commodity, whereas crude oil is fully hedgeable via reserves and pipeline bypasses.
0
0
math.OC 2026-05-13 2 theorems

Generative tilting recovers EOT plans for new marginals from reference samples

Generative Transfer for Entropic Optimal Transport with Unknown Costs

An iterative algorithm evolves couplings to new data distributions under a shared unknown cost, with convergence guarantees in Wasserstein-1

Figure from the paper full image
abstract click to expand
This paper addresses the practical challenge in Entropic Optimal Transport (EOT) where the underlying ground cost function is typically latent and unobserved. Rather than assuming a fixed geometric cost, we adopt a data-driven approach where a shared cost is revealed only through samples from a reference optimal coupling. The question is then: given samples from a reference optimal coupling, can we recover the optimal coupling for new marginals under the same latent cost? We introduce a generative transfer framework that recovers the optimal coupling for new marginals by utilizing an iterative path-wise tilting algorithm. Unlike static importance reweighting, this method evolves the coupling jointly with a marginal transport path, allowing mass to move beyond the reference support. We derive sample-level learning rules for these infinitesimal updates, which yield covariance-type evolution equations for the associated transport vector fields. By integrating this dynamics with Conditional Flow Matching (CFM), we produce a practical sampler for paired data. Finally, we provide theoretical guarantees establishing a global convergence rate of \mathcal{O}(\delta), ensuring the generated coupling converges to the target EOT plan in W_1 distance.
0
0
math.OC 2026-05-13 Recognition

ZOPPA converges at fixed temperature via smoothed proximal point

Convergence of zeroth-order proximal point algorithms in the high-temperature regime

Algorithm acts exactly on a temperature-smoothed function, giving convergence without vanishing temperature or exploding sample variance

Figure from the paper full image
abstract click to expand
Efficient methods for non-convex black-box optimization largely rely on sampling. In this context, the Zeroth-Order Proximal Operator (ZOPO) and the corresponding Zeroth-Order Proximal Point Algorithm (ZOPPA) have attracted significant interest, as they combine the advantage of requiring only objective evaluations with the powerful theoretical framework of proximal point algorithms. ZOPO depends on a temperature parameter which, when going to zero, reduces ZOPO to the exact proximal operator. By exploiting this property, the vanishing-temperature regime has been leveraged in several works to obtain theoretical guarantees via inexact proximal methods. However, this regime is computationally unsustainable when sampling is used to estimate ZOPO, since the corresponding Monte Carlo estimators suffer from severe variance issues. We therefore propose a comprehensive analysis of ZOPO for any fixed positive temperature, and prove convergence of ZOPPA under minimal assumptions on the objective function. We do so by demonstrating that ZOPPA can be interpreted as an exact proximal point method applied to an auxiliary smoothed objective, rather than an inexact method on the original function. Importantly, we further derive explicit guarantees connecting this smoothed problem back to the original objective and establish convergence results for the sampled method (S-ZOPPA) at a fixed temperature.
0
0
math.OC 2026-05-13 2 theorems

Spectral preconditioning converges for nonconvex constrained optimization

Constrained Stochastic Spectral Preconditioning Converges for Nonconvex Objectives

Proximal methods with geometry-tailored analysis handle heavy-tailed noise and both convex and nonconvex constraints.

Figure from the paper full image
abstract click to expand
In this work, we develop proximal preconditioned gradient methods with a focus on spectral gradient methods providing a proximal extension to the Muon and Scion optimizers. We introduce a family of stochastic algorithms that can handle a wide variety of convex and nonconvex constraints and study its convergence under heavy-tailed noise, through a novel analysis tailored to the geometry of the proposed methods. We further propose a variance-reduced version, which achieves faster convergence under standard noise assumptions. Finally, we show that the polynomial iterations used in Muon are more accurately captured by a nonlinear preconditioner than by the ideal matrix sign, leading to a convergence analysis that more faithfully reflects practical implementations.
0
0
math.OC 2026-05-13 2 theorems

Proximal limited-memory quasi-Newton converges globally for nonconvex problems

Proximal Limited-Memory Quasi-Newton Methods for Nonsmooth Nonconvex Optimization

Adaptive regularization and compact updates guarantee progress when the objective splits into smooth and nonsmooth nonconvex parts.

Figure from the paper full image
abstract click to expand
We introduce a proximal limited--memory quasi--Newton scheme for minimizing the sum of a continuously differentiable function and a proper, lower semicontinuous and prox-bounded, possibly nonsmooth, function. Both functions might be nonconvex. The method builds upon the computation of scaled proximal operators and is globalized by adaptively updating a regularization parameter based on a criterion of sufficient decrease. We prove global convergence under mild assumptions and then establish convergence of the entire sequence (with rates) under the Kurdyka--Lojasiewicz property. To efficiently solve the subproblems, we exploit the compact representation of limited-memory quasi-Newton updates. We derive also a compact representation of the limited--memory Kleinmichel formula, a rank-one quasi-Newton scheme that preserves positive definiteness under the same condition as the BFGS update. Numerical results show a significant speed up compared to other methods.
0
0
math.OC 2026-05-13 2 theorems

Neumann boundary data observe waves on gas giant metrics

Boundary observability for gas giant metrics

An observability inequality is proven for singular metrics modeling acoustic waves in planets by reducing to the separable case and using In

abstract click to expand
We study the observability of waves on gas giant manifolds which are a class of Riemannian manifolds whose metrics are singular at the boundary. Such manifolds arise naturally in modeling of acoustic wave propagation in gas giant planets.We establish an observability inequality using full boundary measurements given by a Neumann-type trace that is natural in the gas giant setting. The proof proceeds in two steps. First, observability for a general gas giant metric is reduced to the so-called separable case via a perturbation argument. In the separable case, we employ a uniform-in-tangential-frequency analysis combined with an Ingham inequality to prove observability.
0
0
math.OC 2026-05-13 Recognition

HS-Jacobian lets Adam train neural nets with linear constraints

Efficient and provably convergent end-to-end training of deep neural networks with linear constraints

Proving the Jacobian is conservative for polyhedral projections supplies both efficient backpropagation and convergence guarantees.

Figure from the paper full image
abstract click to expand
Training a deep neural network with the outputs of selected layers satisfying linear constraints is required in many contemporary data-driven applications. While this can be achieved by incorporating projection layers into the neural network, its end-to-end training remains challenging due to the lack of rigorous theory and efficient algorithms for backpropagation. A key difficulty in developing the theory and efficient algorithms for backpropagation arose from the nonsmoothness of the solution mapping of the projection layer. To address this bottleneck, we introduce an efficiently computable HS-Jacobian to the projection layer. Importantly, we prove that the HS-Jacobian is a conservative mapping for the projection operator onto the polyhedral set, enabling its seamless integration into the nonsmooth automatic differentiation framework for backpropagation. Therefore, many efficient algorithms, such as Adam, can be applied for end-to-end training of deep neural networks with linear constraints. Particularly, we establish convergence guarantees of the HS-Jacobian based Adam algorithm for training linearly constrained deep neural networks. Extensive experiment results on several important applications, including finance, computer vision, and network architecture design, demonstrate the superior performance of our method compared to other existing popular methods.
0
0
math.OC 2026-05-13 3 theorems

Barrier smoothing yields O(K^{-2/3}) stationarity for constrained bilevel opt

A Barrier-Metric First-Order Method for Linearly Constrained Bilevel Optimization

First-order proxy-gradient method uses only gradients plus explicit barrier Hessian and avoids lower-level linear solves.

Figure from the paper full image
abstract click to expand
We study bilevel optimization with a fixed polyhedral lower feasible set. Such problems are challenging for two reasons: active-set changes can make the upper objective nonsmooth, and existing hypergradient methods typically require lower-Hessian inversions or equivalent linear solves, which are computationally expensive. To address these issues, we adopt a logarithmic barrier smoothing of the lower problem to obtain a differentiable approximation of the constrained bilevel objective, and develop a proxy-gradient algorithm for the resulting barrier-smoothed surrogate. The algorithm uses only gradients of the upper and lower objectives; its only second-order object is the explicit logarithmic barrier Hessian determined by the fixed polyhedral constraints. Barrier smoothing restores differentiability, but Euclidean smoothness constants are not uniformly bounded near the boundary. We therefore develop a local Dikin-geometry analysis in which the barrier-metric provides an oracle-free curvature scale near the moving lower centers. This leads to barrier-aware schedules that keep the iterates inside locally well-behaved regions. For the barrier-smoothed objective, we prove stationarity rates of $\widetilde{O}(K^{-2/3})$ in the deterministic setting and $\widetilde{O}(K^{-2/5})$ under upper-level-only bounded stochastic noise after $K$ outer iterations, together with quantitative bias control as the barrier parameter decreases.
0
0
math.OC 2026-05-13 2 theorems

DNN relaxation exact for random quadratics with high probability

Exactness of the DNN Relaxation for Random Standard Quadratic Programs

Small defect components let the convex relaxation decompose into exact low-dimensional problems and recover a unique rank-one solution in O(

abstract click to expand
We study the doubly nonnegative (DNN) relaxation of the standard quadratic optimization problem \[ \min\{x^\top Qx:\ x\in\Delta^{n-1}\},\qquad \Delta^{n-1}:=\{x\in\mathbb{R}_+^n:\ \mathbb{1}^\top x=1\}, \] for random symmetric matrices with independent diagonal and off-diagonal entries. Let $m_n:=\min_{1\le i\le n} Q_{ii}$ and set $M:=Q-m_nE$, where $E$ is the all-ones matrix. The negative off-diagonal entries of $M$ define a defect graph $G_n^-$. Under entrywise independence, absolute continuity, and the tail-decay condition $n^5\mathbb{E}[F_O(m_n)^4]\to 0$, where $F_O$ is the off-diagonal distribution function, we prove that with probability tending to one every defect component has size at most $4$. On this event, the shifted DNN value decomposes over defect components. Since the DNN and completely positive cones coincide in dimensions at most four, each local relaxation is exact. A finite KKT-candidate argument gives local uniqueness, and absolute continuity rules out ties, so the global DNN optimizer is unique and rank one. The graph estimate uses the fact that every connected component of size at least five contains a tree on exactly five vertices. For Gaussian orthogonal ensemble data, we prove the explicit bound \[ \mathbb{P}\bigl(\text{the DNN optimizer is unique and rank one}\bigr) \ge 1-K\frac{(\ln n)^2}{n^3}. \] On the same event, the exact optimizer can be recovered in $O(n^2)$ time by constructing the defect graph and solving constant-size local KKT systems. We also verify the tail condition for variance-tuned Gaussian Wigner models, heavy-tailed laws, and finite-lower-endpoint laws.
0
0
math.OC 2026-05-13 2 theorems

Schrödinger bridge solves sub-Riemannian optimal transport

From Schrodinger Bridge to Optimal Transport over Sub-Riemannian Manifolds

Entropic regularization with aligned noise yields a Sinkhorn algorithm that recovers the deterministic problem in the zero-noise limit.

Figure from the paper full image
abstract click to expand
We study the least-energy way to reshape a probability distribution when motion is constrained to a horizontal bundle, that is, optimal transport and distribution steering in sub-Riemannian geometry, motivated by density control over underactuated systems. To obtain a continuous and numerically tractable formulation, we introduce an entropic regularization by adding small noise aligned with the control directions and study the associated Schrodinger bridge problem. The resulting reference process is a degenerate diffusion on the sub-Riemannian manifold. Under bracket-generating hypotheses we obtain smooth, strictly positive transition densities and a forward--backward characterization of the optimal bridge. This leads to a practical Sinkhorn-type algorithm for the Schrodinger potentials and, as the noise level vanishes, a recovery of the deterministic sub-Riemannian optimal transport problem. We demonstrate with a numerical example.
0
0
math.OC 2026-05-13 Recognition

Certificate establishes exact worst-case rate for gradient descent when N >= 3

The Grimmer-Shu-Wang Certificate and the Drori-Teboulle Minimax Nonnegative Constant-Stepsize Bound for N >= 3

Positive vectors a, b, c, d satisfying the residual equations yield the upper bound that matches lower bounds from quadratic and Huber cases

abstract click to expand
We prove, for every horizon N >= 3, the existence of the strengthened low-rank performance-estimation certificate proposed by Grimmer, Shu, and Wang for the Drori-Teboulle minimax nonnegative constant-stepsize problem for gradient descent. Let rho_N in (0,1) be the unique solution of rho_N^{2N}(2N rho_N+2N+1)=1. We show that the GSW certificate equations admit positive vectors a, b, c, d satisfying all residual equations. The proof proceeds through a reduced residual system in the variables d, a simplex existence argument for a positive reduced zero, a terminal residual completion identity, and a tail-square convolution argument proving the cumulative margins that force positivity of the certificate coefficients. Consequently, the GSW low-rank PEP certificate exists for every N >= 3 and yields the Drori-Teboulle upper bound. Together with the one-dimensional quadratic and Huber lower-bound examples, this establishes the Drori-Teboulle minimax nonnegative constant-stepsize value for gradient descent for every N >= 3.
0
0
math.OC 2026-05-13 2 theorems

Reputation learning closes loop on Byzantine consensus

Byzantine-Resilient Consensus via Active Reputation Learning

Active reputation vectors built inside the consensus loop suppress adversaries and improve agreement among normal agents.

Figure from the paper full image
abstract click to expand
This paper proposes a Byzantine-resilient consensus framework that simultaneously pursues two tightly coupled objectives: actively identifying Byzantine agents and guaranteeing resilient consensus among normal agents. Unlike existing methods that treat adversary mitigation as a passive filtering process, our approach embeds an active reputation learning mechanism into the consensus loop. Agents evaluate neighbors' behaviors using outlier-robust loss functions and historical information, and construct a reputation vector on a probability simplex via a mechanism that balances loss minimization with diversity-preserving exploration, representing dynamic beliefs over neighbor trustworthiness. These reputations are then used to form weighted local updates that suppress adversarial influence and improve agreement among normal agents, thereby reducing the bias in local loss evaluations and enabling more reliable subsequent reputation estimation. This learning-control co-design yields a closed-loop dual objective: improved consensus states enhance Byzantine identifiability, while refined reputations in turn improve consensus. A range of distributed systems experiments, benchmarking against classical resilient consensus methods, demonstrate superior Byzantine detection accuracy and significantly more reliable and scalable consensus.
0
0
math.OC 2026-05-12 2 theorems

Mirror descent computes exact barycenters for discrete and continuous measures

A Unified Approach for Computing Wasserstein Barycenters of Discrete and Continuous Measures

Starting from any density, the algorithm solves semi-discrete transport problems and converges to the true barycenter even when all inputs,

Figure from the paper full image
abstract click to expand
Computing the unregularized Wasserstein barycenter for measure-valued data is a challenging optimization task. Recent algorithms have been tailored to either discrete measures as point clouds or continuous measures discretized on regular grids. In this work, we propose a primal mirror descent algorithm for computing the exact Wasserstein barycenter in the Fisher-Rao geometry. Our algorithm is a unified approach that is flexible enough to simultaneously cover discrete and absolutely continuous input measures, with convergence guarantees in both settings. In particular, when all input measures are discrete, our algorithm, initialized from any probability density, solves a sequence of semi-discrete optimal transport subproblems and produces absolutely continuous iterates that converge to the discrete barycenter. We use synthetic and real data examples to demonstrate the promising result in terms of accuracy and computational cost.
0
0
math.OC 2026-05-12 2 theorems

Projected JSR can be strictly smaller than γ for deflated Q-VI

Switching-Geometry Analysis of Deflated Q-Value Iteration

Quotienting out the all-ones direction from the switching model yields a tighter convergence-rate bound while leaving the greedy policies un

Figure from the paper full image
abstract click to expand
This paper develops a joint spectral radius (JSR) framework for analyzing rank-one deflated Q-value iteration (Q-VI) in discounted Markov decision process control. Focusing on an all-ones residual correction, we interpret the resulting algorithm through the geometry of switching systems and, to the best of our knowledge, give the first JSR-based convergence analysis of deflated Q-VI for policy optimization problems. Our analysis reveals that the standard Q-VI switching system model has JSR exactly the discount factor $\gamma\in (0,1)$, since all admissible subsystems share the all-ones vector as an invariant direction. By passing to the quotient space that removes this direction, we obtain a projected switching system model whose JSR governs the relevant error dynamics and may be strictly smaller than $\gamma$. Therefore, the deflated Q-VI admits a potentially sharper convergence-rate characterization than the ambient-space $\gamma$-bound. Finally, we prove that the correction is equivalent to a scalar recentering of standard Q-VI. Hence, the projected trajectory, and therefore the greedy-policy sequence, is unchanged relative to standard Q-VI initialized from the same point. The benefit of deflation is not a change in the induced decision-making problem, but a more precise JSR-based description of the convergence geometry after the redundant all-ones component is removed.
0
0
math.OC 2026-05-12 Recognition

Single network solves optimal transport via proximal fixed points

Fixed-Point Neural Optimal Transport without Implicit Differentiation

Reformulating the c-transform enforces dual feasibility exactly and skips adversarial training plus implicit differentiation.

Figure from the paper full image
abstract click to expand
We propose an implicit neural formulation of optimal transport that eliminates adversarial min--max optimization and multi-network architectures commonly used in existing approaches. Our key idea is to parameterize a single potential in the Kantorovich dual and reformulate the associated c-transform as a proximal fixed-point problem. This yields a stable single-network framework in which dual feasibility is enforced exactly through proximal optimality conditions rather than adversarial training. Despite the inner fixed-point computation, gradients can be computed without differentiating through the fixed-point iterations, enabling efficient training without requiring implicit differentiation. We further establish convergence of stochastic gradient descent. The resulting framework is efficient, scalable, and broadly applicable: it simultaneously recovers forward and backward transport maps and naturally extends to class-conditional settings. Experiments on high-dimensional Gaussian benchmarks, physical datasets, and image translation tasks demonstrate strong transport accuracy together with improved training stability and favorable computational and memory efficiency.
0
0
math.OC 2026-05-12 2 theorems

Gradient descent reaches only global minima in wide shallow nets

On the global convergence of gradient descent for wide shallow models with bounded nonlinearities

Non-global minimizers are unstable in the mean-field limit when initial parameters have full support, covering attention layers and vector-s

abstract click to expand
A surprising phenomenon in the training of neural networks is the ability of gradient descent to find global minimizers of the training loss despite its non-convexity. Following earlier works, we investigate this behavior for wide shallow networks. Existing results essentially cover the case of ReLU activations and the case of sigmoid activations with scalar output weights. We study a large class of models that includes multi-head attention layers and two-layer sigmoid networks with vector output weights. Building upon [Chizat and Bach, 2018], we prove that all non-global minimizers of the training loss are unstable under gradient descent dynamics. Thus, when the initial distribution of the parameters has full support (which includes the popular Gaussian case), and in the many hidden neurons or attention heads limit, continuous-time gradient descent can only converge to global minimizers. Establishing the instability of non-global minimizers corresponds to the construction of an ``escaping active set'' -- we complete the proof of [Chizat and Bach, 2018] to construct this set for models with bounded nonlinearities and scalar output weights. We also extend this construction to new cases for models with vector output weights. Finally, we show the well-posedness and the stability with respect to discretization of the mean field training dynamic for sub-Gaussian initializations.
0
0
math.OC 2026-05-12 2 theorems

Decentralized MPC with safe sets guarantees multi-agent collision avoidance

Decentralized Contingency MPC based on Safe Sets for Nonlinear Multi-agent Collision Avoidance

Contingency certificates and geometric updates enable safe motion for nonlinear agents using only observed states and no trajectory sharing.

Figure from the paper full image
abstract click to expand
Decentralized collision avoidance remains challenging, particularly when agents do not communicate any information related to planned trajectories. Most existing approaches either rely on conservative coordination mechanisms or provide limited guarantees on recursive feasibility and convergence. This paper develops a decentralized contingency MPC framework for multi-agent systems with nonlinear dynamics that achieves collision-free motion under a state-only information pattern. Each agent follows the same consensual rule set, enabling safe decentralized planning without communication. Each agent solves a local optimization problem that couples a nominal trajectory with a contingency certificate ensuring a feasible backup maneuver under receding-horizon operation. A novel geometric and decentralized safe-set update mechanism prevents feasibility loss between consecutive time steps. The resulting scheme guarantees recursive feasibility, including collision avoidance, and establishes a Lyapunov-type convergence result to an admissible safe equilibrium. Simulation results demonstrate performance in both sparse and dense multi-agent environments, including cluttered bottleneck scenarios and under plug-and-play operation.
0
0
math.OC 2026-05-12 2 theorems

Exponential bound proven for LCP sufficient-matrix handicaps

Handicap reduction for linear complementarity problems

The bound depends only on input bit length and is paired with an ellipsoid-based algorithm that minimizes the effective handicap via row and

abstract click to expand
Linear Complementarity Problems (LCPs) with sufficient matrices form an important subclass of LCPs, and it remains a significant open question whether problems in this class can be solved in polynomial time. Kojima, Megiddo, Noma, and Yoshise gave an Interior Point Algorithm (IPA) in 1991, that can solve LCPs with sufficient matrices in time bounded polynomially in the input size and the so-called handicap number $\hat\kappa(M)$ of the coefficient matrix $M$. However, this value can be exponentially large in the bit encoding length. In fact, no upper bounds were previously known on $\hat\kappa(M)$. Settling an open question raised in de Klerk and E.-Nagy (Math Programming, 2011), we give an exponential upper bound on $\hat\kappa(M)$ in the bit-complexity of $M$. This is based on a new characterization of sufficient matrices. The new characterization also leads to a simple new proof of V\"aliaho's theorem on the equivalence of sufficient and $\mathcal{P}^*$-matrices (Linear Algebra and its Applications, 1996). Noting that one can obtain an equivalent LCP by rescaling the rows and columns by a positive diagonal matrix, we define $\hat\kappa^\star(M)$ as the best possible handicap number achievable under such rescalings. Our second main result is an algorithm for LCPs with sufficient matrices, where the running time is polynomially bounded in the input size and in the optimized value $\hat\kappa^\star(M)$. This algorithm is based on the observation that the set of near-optimal row-rescalings forms a convex set. Our algorithm combines the Ellipsoid Method over the set of row rescalings, and an IPA with running time dependent on the handicap number of the matrix. If the IPA fails to solve the LCP in the desired running time, it provides a separation oracle to the Ellipsoid Method to find a better rescaling.
0
0
math.OC 2026-05-12 Recognition

New moves link all incomplete tournament schedules

Novel neighborhood structures for incomplete round robin sports tournaments

A single-round change structure connects every feasible schedule, letting search algorithms find better sports timetables.

Figure from the paper full image
abstract click to expand
The incomplete round robin sports tournament format, where each team plays the same number of games but faces only a subset of the other teams, is becoming increasingly popular in both youth and professional competitions. In contrast to conventional round robin tournaments, however, neighborhood structures for scheduling incomplete round robin tournaments have largely remained unexplored. We fill this gap by proposing two novel neighborhood structures and describe them in graph theory terms. One of them introduces a single new game followed by a minimal repair chain, while the other introduces possibly many new games but only affects a single round. The latter is shown to fully connect the solution space. We embed the neighborhoods in an adaptive late acceptance hill climbing algorithm and show that the proposed algorithm obtains high quality and new best solutions for several sets of instances from the literature, thereby empirically confirming the effectiveness of the proposed neighborhoods.
0
0
math.OC 2026-05-12 2 theorems

Frank-Wolfe lower bound matches upper bound on p-uniformly convex sets

Curvature-Dependent Lower Bounds for Frank-Wolfe

The Omega of T to the minus p over p minus one rate holds for exact line search and short steps on simple instances.

Figure from the paper full image
abstract click to expand
The Frank-Wolfe algorithm achieves a convergence rate of $\mathcal{O}(1/T)$ for smooth convex optimization over compact convex domains, accelerating to $\mathcal{O}(1/T^2)$ when both the objective and the feasible set are strongly convex. This acceleration extends beyond strong convexity: Kerdreux et al. (2021a) proved rates of $\mathcal{O}(T^{-p/(p-1)})$ over $p$-uniformly convex feasible sets, a class that interpolates between strongly convex sets and more general curved domains such as $\ell_p$ balls. In this work, we establish a matching $\Omega(T^{-p/(p-1)})$ lower bound for every $p\ge 3$ under exact line search or short steps, and extend the lower bound to objectives satisfying a H\"olderian error bound. The proofs analyze the dynamics of Frank-Wolfe iterates on simple instances and hence are not limited to the high-dimensional setting, unlike information-theoretic lower bounds.
0
0
math.OC 2026-05-12 2 theorems

Riemannian L-BFGS handles Euclidean bounds on manifolds

A Riemannian quasi-Newton algorithm for optimization with Euclidean bounds

Tangent-space updates plus adapted Cauchy point strategy match classical performance on Euclidean problems and deliver large speedups on two

Figure from the paper full image
abstract click to expand
We propose a Riemannian limited-memory BFGS method for optimization problems with Euclidean bounds. The method combines a limited-memory quasi-Newton update in the tangent space with a Riemannian adaptation of the generalized Cauchy point strategy from classical L-BFGS-B, enabling efficient handling of Euclidean bounds while exploiting the geometric structure of the optimization domain. This setting is important in several applications, including covariance matrix estimation with bounded variance, neuroimaging, EEG signal classification, and other signal processing or computer-vision tasks that couple manifold variables with constrained Euclidean parameters. We provide a generic algorithmic framework and an implementation of the algorithm in the Manopt.jl library. Numerical experiments on benchmark problems indicate only minor reduction in performance on Euclidean problems compared to the classical L-BFGS-B method, while outperforming interior-point methods. Furthermore, the algorithm was tested on two mixed manifold and bounded Euclidean problems: amplitude-limited blind source separation with Gaussianity penalization and bounded-variance maximum likelihood common principal components analysis. The proposed method outperforms existing methods by several orders of magnitude.
0
0
math.OC 2026-05-12 Recognition

8/3 approximation for matroid-constrained randomized vertex-cover interdiction

Randomized Max-Vertex-Cover Interdiction with Matroid Constraints

Leader randomizes protections by relaxing the attacker's problem and shifting to vertex distributions, yielding a polynomial-time guarantee.

abstract click to expand
We study a new bilevel optimization problem, termed the Randomized Max-Vertex-Cover Interdiction (RMVCI) problem under matroid constraints, which can be modeled as a zero-sum Stackelberg game on a network between a leader and a follower. The leader randomly selects a subset of vertices to protect, subject to a matroid constraint, while the follower-after inferring the leader's protection probability distribution-chooses a subset of vertices (also matroid-constrained) to attack, aiming to maximize the expected total weight of edges incident to the set of vertices that are both attacked and unprotected. The leader's objective is to determine an optimal randomized interdiction strategy that minimizes the follower's expected payoff. Since the follower's response problem is NP-hard, the resulting bilevel program is computationally challenging. We develop a conceptual approximation framework for tackling general bilevel interdiction problems. For the RMVCI problem under matroid constraints, we first formulate the follower's problem as an integer linear program and show that its linear relaxation admits a tight integrality gap of $\tfrac{4}{3}$. Within the approximation framework, we replace the follower's problem by its LP relaxation, and then study the resulting bilevel program. By shifting from distributions over sets to distributions over vertices and applying our approximation framework, we manage to design a polynomial-time 2-approximation algorithm for this relaxed bilevel problem. Combining these ingredients within our framework yields a polynomial-time $\tfrac{8}{3}$-approximation algorithm for RMVCI under matroid constraints.
0
0
math.OC 2026-05-12 Recognition

Backstepping observer stabilizes error in blood flow cascade models

Observer Design for a Class of ODE -- Continuum-PDE Cascade Systems Inspired by a Control-Theoretic Model of Large-Scale Arterial Networks of Blood Flow

Design for ODE-continuum-PDE systems as limits of artery networks proves exponential convergence from average measurements.

Figure from the paper full image
abstract click to expand
We develop a backstepping-based observer design for a class of ODE - continuum-PDE cascade systems, which can be viewed as the limit, of a finite collection of ODE - $2 \times 2$ hyperbolic systems, as the number of individual PDE system components tends to infinity. The large-scale collection of ODE - $2 \times 2$ hyperbolic systems is motivated by a dynamic model that we present, of a network of peripheral arteries, to which central (aortic) blood flow/pressure enters. We address a case in which average (boundary) measurements, over the ensemble dimension, are available, which is motivated by the availability of non-invasive, peripheral flow/pressure measurements. Exponential stability of the estimation error system is shown by proving well-posedness of the kernel equations and constructing a Lyapunov functional. We also establish that part of the backstepping kernels derived coincide with the solution of a Sylvester equation. We then apply the continuum-based observer for state estimation of the large-scale counterpart and, in particular, of the blood flow system, introducing an approach for optimal construction of continuum approximations. We also introduce an implementation method, adopting a spectral-based approach for computing the observer dynamics, which we illustrate in an academic, numerical simulation example. Furthermore, we illustrate the design in the problem of central flow/pressure estimation using realistic parameters and flow/pressure waveforms.
0
0
math.OC 2026-05-12 2 theorems

Bound certifies any learned controller for unknown linear systems

A PAC-Bayes Approach for Controlling Unknown Linear Discrete-time Systems

The guarantee applies to stochastic policies and unbounded quadratic costs, with algorithms for finite or infinite controller sets.

Figure from the paper full image
abstract click to expand
This paper presents a PAC-Bayes framework for learning controllers for unknown stochastic linear discrete-time systems, where the system parameters are drawn from a fixed but unknown distribution. We derive a data-dependent high probability bound on the performance of any learned (stochastic) controller, and propose novel efficient learning algorithms with theoretical guarantees, which can be implemented for both finite and infinite controller spaces. Compared to prior work, our bound holds for unbounded quadratic cost. In the special case where LQG is optimal, our numerical results suggest that the learned controllers achieve comparable performance to LQG.
0
0
math.OC 2026-05-12 2 theorems

XP algorithms and W[1]-hardness classify stationarity testing for PA functions in fixed d

Parameterized Complexity of Stationarity Testing for Piecewise-Affine Functions and Shallow CNN Losses

The classification covers local-minimality checks and extends to shallow ReLU CNN weight-space stationarity, clarifying tractability bounds.

Figure from the paper full image
abstract click to expand
We study the parameterized complexity of testing approximate first-order stationarity at a prescribed point for continuous piecewise-affine (PA) functions, a basic task in nonsmooth optimization. PA functions form a canonical model for nonsmooth stationarity testing and capture the local polyhedral geometry that appears in ReLU-type training losses. Recent work by Tian and So (SODA 2025) shows that testing approximate stationarity notions for PA functions is computationally intractable in the worst case, and identifies fixed-dimensional tractability as an open direction. We address this direction from the viewpoint of parameterized complexity, with the ambient dimension $d$ as the parameter. In this paper, we give XP algorithms in fixed dimension for the tractable sides, and prove W[1]-hardness for the complementary sides. Moreover, lower bounds under the Exponential Time Hypothesis rule out algorithms running in time $\rho(d)\size^{o(d)}$ for any computable function $\rho$, where $\size$ denotes the total binary encoding length of the stationarity-testing instance. As a further consequence, our results yield the corresponding parameterized complexity picture for testing local minimality of continuous PA functions. We further extend our hardness results to a family of shallow ReLU CNN training losses, with stationarity tested in the trainable weight space. Thus, the same parameterized-complexity picture also appears for simple CNN training losses.
0
0
math.OC 2026-05-12 2 theorems

Transformation stabilizes ODE-wave cascade with boundary disturbances

Stabilization for a Cascaded ODE-Wave Equation with Boundary Nonlinear Disturbances

A mapping of PDE inputs to the ODE and an estimator-observer scheme achieve exponential stability and boundedness for the closed-loop system

abstract click to expand
In this article, we investigate the problem of exponential stabilization via output feedback for a cascaded system composed of an ordinary differential equation (ODE) and a wave partial differential equation (PDE) under boundary control. Four types of boundary interconnections are considered. In the absence of disturbances, a novel transformation is introduced to incorporate the PDE boundary control input into the ODE subsystem. Based on this transformation, a state feedback controller is designed to achieve exponential stability for both the ODE and PDE components. When internal uncertainties and external disturbances that match the control structure are present, a disturbance estimator is constructed. Utilizing this estimator, a Luenberger-type state observer is developed to reject the disturbances and exponentially stabilize the original system via an observer-based control scheme. Furthermore, the boundedness of the entire closed-loop system is rigorously established. Numerical simulations are provided to illustrate the theoretical results.
0
0
math.OC 2026-05-11 2 theorems

Power law model splits Muon and SignSGD into three phases

Phases of Muon: When Muon Eclipses SignSGD

SignSVD preconditioning wins uniformly in one spectral regime, SignSGD in another, and the methods trade off in a third region of the alpha,

Figure from the paper full image
abstract click to expand
Recently, Muon and related spectral optimizers have demonstrated strong empirical performance as scalable stochastic methods, often outperforming Adam. Yet their behaviour remains poorly understood. We analyze stochastic spectral optimizers, including Muon, on a high-dimensional matrix-valued least squares problem. We derive explicit deterministic dynamics that provide a tractable framework for studying learning behaviour with a focus on (stochastic) SignSVD, which Muon approximates, and (stochastic) SignSGD, the latter serving as a proxy for Adam. Our analysis shows that for large batch size, SignSVD performs a square-root preconditioning with respect to the data covariance spectrum, while for small batch size smaller eigenmodes behave like SGD, slowing down convergence. We contrast with SignSGD which for generic covariance performs no preconditioning and has no transition, leading to different optimal learning rates and convergence characteristics. The two methods match up to a constant factor with isotropic data, but behave differently with anisotropic data. An analysis of a power law covariance model with data exponent $\alpha$ and target exponent $\beta$ shows there are three phases in the $(\alpha,\beta)$ plane: one where SignSGD is uniformly favored, one where SignSVD is uniformly favored, and a third where the two methods exhibit a trade-off in performance.
0
0
math.OC 2026-05-11 Recognition

Certificates isolate Koopman regression failures by layer

Diagnostic Certificates of Data Quality and Regression Identifiability for Koopman Identification

Regression-spectrum bound controls smallest singular value while state and lifted checks remain independent.

Figure from the paper full image
abstract click to expand
Classical persistent excitation criteria usually assess whether an input or regressor signal is sufficiently rich. In Koopman and EDMD with control (EDMDc), however, data quality is determined by the concatenation of lifted state features and control inputs. Input-rich data can still visit a narrow state region, well-spread state samples can still produce degenerate lifted features, and both can fail to condition the final regression problem. This paper develops a diagnostic certificate framework for locating these failures. The certificates separate state-space coverage and clustering, lifted-feature nondegeneracy, and the final regression spectrum. The regression-spectrum certificate is the layer with direct theoretical guarantees: it controls the active standardized design's smallest singular value, has Fisher-information and one-step EDMDc stability interpretations, and admits a finite-sample lower bound under a population spectral gap. We also give structural examples and a Schur-complement condition showing why state, lifted, input, and regression diagnostics cannot be substituted for one another. As a sampling example, IGPE-DOPT uses these certificates to score candidate trajectory segments. Experiments on Duffing, Van der Pol, and Lorenz systems compare input-, state-, lifted-, and regression-oriented baselines. The results show that certificate layers separate, budget and weights shift bottlenecks, and downstream prediction or control performance is not monotone in any single certificate. The framework is therefore intended as an interpretable diagnostic and data-collection guide, not as a universal optimality claim.
0
0
math.OC 2026-05-11 Recognition

Mobile multiplicative control steers quasilinear parabolic equations to rest

Controllability of quasilinear parabolic equations under multiplicative mobile controls

Decay of the uncontrolled system enables a smooth transition that reaches exact zero state at finite time T.

abstract click to expand
This paper addresses the controllability of a class of quasi-linear parabolic equations governed by multiplicative controls with mobile support. To prove the existence of such a control forcing the solution to rest at time $T>0$, we first establish the decay property of solutions for the uncontrolled system. Unlike the case of the linear heat equation, the nonlinearity in the principal part of the operator introduces significant challenges. These difficulties necessitate a novel approach, ultimately leading us to solve the controllability problem within the framework of classical solutions. Through a carefully constructed smooth transition, we demonstrate that there exists a multiplicative control driving the state exactly to rest at time $t=T$.
0
0
math.OC 2026-05-11 2 theorems

Barrier certificates bound violation risk under stochastic predicates

Barrier Certificates for Uncertain Temporal Specifications

Augments the state with Markovian predicate uncertainty to turn the problem into a deterministic barrier-certificate task for linear systems

Figure from the paper full image
abstract click to expand
This paper studies satisfying temporal logic specifications on stochastic dynamical systems, where the predicates evolve randomly over time. Such randomness may arise from uncertain environment models or external stochastic processes causing the sets associated with predicate satisfaction to vary in a non-deterministic manner. As a result, verifying whether a stochastic dynamical system satisfies a temporal specification depends also on the uncertainty in the predicates. We develop a certificate-based framework to bound the probability of satisfying temporal logic specifications with randomly evolving predicates. We first show that temporal logic specifications with stochastic predicates can be transformed to specifications with deterministic predicates on an augmented space which is extended to include the stochastic space of predicate's uncertainty. We then utilize barrier certificates on an augmented space to provide tractable optimization-based conditions and to avoid the computational burden of dynamic programming. Focusing on linear dynamics and safety-type specifications, we derive analytical conditions under which barrier certificates guarantee bounds on the probability of violating the stochastic safety predicates. The approach is demonstrated on numerical case studies.
0
0
math.OC 2026-05-11 Recognition

Nonlinear adjoints yield explicit controls for quadruple linear systems

Extended MF-FBSDEs with nonlinear domination-monotonicity conditions and stochastic optimal controls of Linear System with quadruple controls

Extended domination-monotonicity conditions solve linear-convex and linear-quadratic problems with time-dependent random input constraints.

abstract click to expand
This paper extends the domination-monotonicity conditions, which guarantee the well-posedness of extended mean-filed forward-backward stochastic differential equations (extended MF-FBSDEs), from the previously studied linear framework to a nonlinear setting by incorporating nonlinear adjoint functions. Utilizing this generalized well-posedness result for extended MF-FBSDEs in conjunction with other refined analytical techniques, we address two classes of stochastic quadruple optimal controlled problems: a linear-convex problem and a linear-quadratic problem with input constraints that are permitted to be time-dependent and random. For each problem, we establish the existence and uniqueness of optimal controls and derive their explicit closed-form representations.
0
0
math.OC 2026-05-11 Recognition

Newton method quadratically solves 0-1 loss quadratic SVM

Newton Method for Soft Quadratic Surface Support Vector Machine with 0-1 Loss Function

It reaches higher accuracy with lower CPU time than other solvers on artificial and benchmark datasets.

Figure from the paper full image
abstract click to expand
A nonlinear kernel-free soft quadratic surface support vector machine model with 0-1 loss function ($L_{0/1}$-SQSSVM) is proposed for binary classification problems, which is non-convex discontinuous. We are devoted to establishing the first and the second-order optimality conditions for the $L_{0/1}$-SQSSVM. We establish a stationary equation using the properties of proximal operator of 0-1 loss function. We design a Newton method based on the stationary equation to solve $L_{0/1}$-SQSSVM model and prove that the Newton method has local quadratic convergence under the second-order sufficient condition. Numerical experience on artificial datasets and benchmark datasets demonstrate that the Newton method for $L_{0/1}$-SQSSVM achieves higher classification accuracy with less CPU time cost than other state-of-the-art methods.
0
0
math.OC 2026-05-11 Recognition

MI density control for linear systems matches Schrödinger bridges

Mutual Information Optimal Density Control of Linear Systems and Generalized Schr\"{o}dinger Bridges with Reference Refinement

Alternating optimization steps coincide exactly when Gaussian marginals are imposed at chosen times.

Figure from the paper full image
abstract click to expand
We consider a mutual information (MI) regularized version of optimal density control of a discrete-time linear system. MI optimal control has been proposed as an extension of maximum entropy optimal control to trade off between control performance and benefits provided by stochastic inputs. MI regularization induces stochasticity in the policy, which poses challenges for applications of MI optimal control in safety-critical scenarios. To remedy this situation, we impose Gaussian density constraints at specified times to directly control state uncertainty. For this MI optimal density control problem, we propose an alternating optimization algorithm and derive the closed form of each step in the algorithm. In addition, we reveal that the alternating optimization of the MI optimal density control problem coincides with that of the so-called generalized Schr\"{o}dinger bridge problem associated with the discrete-time linear system.
0
0
math.OC 2026-05-11 2 theorems

Nonlocal control problems converge to local ones as s to 1 or delta to 0

Localization for nonlocal gradient-based optimal control problems

Two separate limits recover classical local optimal control from the nonlocal setting for both convex p-Laplacian and polyquasiconvex cases.

abstract click to expand
In this paper we consider optimal control problems in the nonlocal function space framework of Bellido-2023, where there are two different parameters: a horizon parameter $\delta > 0$; and a fractional parameter $s \in (0, 1)$. The constraints are given in the form of minimizing an energy density, and we will focus on two particular cases: the well-posed case where the underlying energy density is convex and is given by the nonlocal $p$-Laplacian; and a more general poly/quasiconvex energy for which minimizers exist but may not be unique. The study is concluded by analyzing the approximation to local problems in two parallel ways, either taking the fractional parameter $s$ to $1$ or the horizon parameter $\delta$ to $0$.
0
0
math.OC 2026-05-11 2 theorems

Unique optimistic choice enables bilevel differentiability on manifolds

Select-then-differentiate: Solving Bilevel Optimization with Manifold Lower-level Solution Sets

A pseudoinverse formula for the hyper-gradient extends classical results to cases with manifold lower-level solutions and yields dimension-

Figure from the paper full image
abstract click to expand
We study optimistic bilevel optimization when the lower-level problem has a non-isolated manifold of minimizers. In this setting, the hyper-objective may be non-differentiable because the upper-level criterion must choose among multiple lower-level solutions. Under a local Polyak--{\L}ojasiewicz (P{\L}) condition, we show that differentiability does not require the lower-level solution set to be a singleton: uniqueness of the optimistic selection is sufficient. This yields an explicit pseudoinverse-based hyper-gradient formula extending the classical singleton-minimizer result. We further characterize the regularity of the hyper-objective: non-degeneracy of the selected minimizer along the solution manifold yields local smoothness, while failure of uniqueness can create many non-differentiable points and failure of non-degeneracy can destroy all positive H\"older regularity of the hyper-gradient. Motivated by this theory, we propose HG-MS, a select-then-differentiate method combining explicit optimistic selection with efficient pseudoinverse-based hyper-gradient computation. Despite the nonconvex nature of optimistic selection over the lower-level solution manifold, we show that HG-MS converges to a stationary point of the optimistic objective with complexity governed by the intrinsic dimension of the solution manifold rather than its ambient dimension. Empirically, we test a practical variant of HG-MS for matched-budget LLM source reweighting. This variant preserves the select-then-differentiate principle and obtains the best GSM8K/MATH scores across the tested backbones, along with competitive or best MT-Bench instruction-following results.
0
0
math.OC 2026-05-11 Recognition

Optimal control exists for Stokes-Cahn-Hilliard-Oono flows

An optimal control problem for Stokes-Cahn-Hilliard-Oono equations with regular potential

Existence of an L2-cost minimizer is shown and first-order conditions are obtained from the adjoint system for two immiscible incompressible

abstract click to expand
This article discusses an optimal control problem for a phase field model of two immiscible incompressible fluid flow, incorporating surface tension effects. The optimal control problem is defined with a $L^2$-cost functional and subject to the constraints governed by a system of coupled Stokes-Cahn-Hilliard-Oono equations. In this model, fluids are separated by a dynamic diffuse interface of finite width. We investigate the optimality condition of a given control. Initially, we establish the existence of an optimal solution for the coupled optimal control problem. Subsequently, we derive the optimality condition with respect to the corresponding adjoint system.
0
0
math.OC 2026-05-11 Recognition

Pre-check detects unbounded polynomial problems before solving

A Certificate of Unboundedness for Polynomial Optimization Problems

Simple algorithm certifies lack of lower bound in any polynomial optimization problem and reveals its asymptotic geometry.

abstract click to expand
Global polynomial optimization methods typically rely on compactness of the feasible region in order to find solutions. These methods can incur considerable computational expense and most commercially available solvers do not verify the existence of a solution prior to undergoing global search. In this manuscript we propose a simple pre-processing algorithm to determine if an arbitrary polynomial optimization problem is unbounded from below thereby providing information about the problem's asymptotic geometry prior to solving the problem if a solution can be found.
0
0
math.OC 2026-05-11 2 theorems

Approximations give controllability to degenerate hyperbolic equations

Approximation of Degenerate Hyperbolic Equations with Interior Degeneracy and Applications to Controllability

A sequence of uniformly hyperbolic equations approximates the interior-degenerate system, enabling the first controllability results in more

abstract click to expand
In this paper, we establish the existence of solutions for a particular class of degenerate hyperbolic equations. Following this, we approximate these degenerate equations by employing a sequence of uniformly hyperbolic equations. Notably, this specific approximation result has remained unexplored in the existing body of literature. Ultimately, we utilize this approximation framework to derive controllability results for the original degenerate hyperbolic equations, marking what could potentially be the inaugural investigation into higher-dimensional degenerate hyperbolic equations.
0
0
math.OC 2026-05-11 2 theorems

Optimizing reduced matrices on a manifold lowers H2 error

Riemannian optimal reduction for linear systems with quadratic outputs

For linear systems with quadratic outputs the method searches coefficient matrices directly on a stability manifold and cuts the variable

Figure from the paper full image
abstract click to expand
This paper presents an H2-optimal model order reduction (MOR) method for linear systems with quadratic outputs based on Riemannian optimization. The H2-optimal MOR is formulated as an optimization problem in which the optimization variables are selected directly as the coefficient matrices of reduced models. The product manifold is defined properly to impose the stability condition for reduced models. By exploiting the geometric properties of the product manifold, we derive an explicit formula for Riemannian gradient of the objective function, and then a limited-memory Riemannian BFGS method is adopted to solve the resulting optimization problem iteratively. In contrast to selecting projection matrices, optimizing coefficient matrices of reduced models reduces the amount of variables dramatically. Numerical simulation results demonstrate that reduced models accurately approximate the original system and exhibit superior performance in terms of H2 error, which confirms the effectiveness of the proposed algorithm.
0
0
math.OC 2026-05-11 2 theorems

Soft penalties make stochastic paths match observed marginals

Learning Generative Dynamics with Soft Law Constraints: A McKean-Vlasov FBSDE Approach

McKean-Vlasov FBSDE learns globally coupled trajectories from endpoint and intermediate distributional data without hard maps.

Figure from the paper full image
abstract click to expand
We propose a generative framework for learning stochastic dynamics from endpoint and intermediate distributional observations. The method formulates generation as a McKean-Vlasov control problem in which terminal and time-marginal laws are enforced through soft energy constraints. The associated optimality system is a forward-backward stochastic differential equation (FBSDE) whose backward component receives a continuous drift induced by the marginal law penalties. This provides a principled alternative to hard interpolation or optimal transport maps between observed distributions: the model learns a stochastic path law whose dynamics remain globally coupled through the mean-field objective. We derive the reduced FBSDE system for quadratic control cost and constant diffusion, connecting terminal and marginal law flat derivatives to score-like training signals. The resulting neural solver is evaluated on low-dimensional distributional benchmarks, where it recovers smooth stochastic paths matching prescribed marginal laws. In a higher-dimensional ALAE latent space, endpoint supervision is used as a qualitative stress test for transporting non-smiling faces toward smiling ones in a pretrained representation. We then use articulated human motion as a structured high-dimensional case study on a curated AMASS low-to-high position dataset, using SMPL-H pose sequences and reduced pose representations. The experiments show that soft marginal law constraints can produce coherent stochastic trajectories whose intermediate distributions follow the observed evolution of human motion. The code is available at https://github.com/murex/deep-mkv-gen/tree/main.
0
0
math.OC 2026-05-11 Recognition

GP-MPC turned into exact LPV for fast quadcopter control

Efficient sparse GP-MPC with accurate mean and variance propagation applied for quadcopter flight control

Closed-form moment matching lets mean and variance propagate accurately so the problem solves as quadratic programs.

Figure from the paper full image
abstract click to expand
This paper presents a computationally efficient approach for Gaussian process model predictive control (GP-MPC), where Gaussian process (GP) regression is used to complement a baseline model of the system dynamics. The proposed method achieves propagation of both the predicted mean and variance, thereby significantly reducing conservativeness compared with existing GP-MPC formulations. The nonlinear GP-MPC problem is reformulated into an exact linear parameter-varying (LPV) structure that preserves the nonlinear prediction dynamics in affine form without introducing further approximation. Moreover, closed-form derivations of moment matching (MM) predictions for sparse GPs are developed, including both mean and variance propagation under uncertain inputs, which improves scalability to larger datasets. This further enables recasting the resulting GP-MPC problem as a sequence of quadratic programs (QPs), which can be solved efficiently. The proposed framework significantly improves runtime efficiency while maintaining prediction accuracy, as demonstrated through simulation and real-world experiments on a Crazyflie 2.1 micro quadcopter.
0
0
math.OC 2026-05-11 2 theorems

Variance reduction shortens time complexity in parallel optimization

Rennala MVR: Improved Time Complexity for Parallel Stochastic Optimization via Momentum-Based Variance Reduction

Rennala MVR improves wall-clock performance over Rennala SGD for heterogeneous clusters when mean-squared smoothness holds.

Figure from the paper full image
abstract click to expand
Large-scale machine learning models are trained on clusters of machines that exhibit heterogeneous performance due to hardware variability, network delays, and system-level instabilities. In such environments, time complexity rather than iteration complexity becomes the relevant performance metric for optimization algorithms. Recent work by Tyurin and Richt\'{a}rik (2023) established the first time complexity analysis for parallel first-order stochastic optimization, proposing Rennala SGD as a time-optimal method for smooth nonconvex optimization. However, Rennala SGD is fundamentally a modification of SGD, and variance reduction techniques are known to improve the iteration complexity of SGD. In this work, we investigate whether variance reduction can also improve time complexity in heterogeneous systems. We show that, under a mean-squared smoothness assumption, variance reduction can improve time complexity in relevant parameter regimes. To this end, we propose Rennala MVR, a variance-reduced extension of Rennala SGD based on momentum-based variance reduction, and analyze its oracle and time complexity. We establish lower bounds for time complexity under these assumptions. On a stochastic quadratic benchmark, experiments with the exact method support the theory, while neural-network experiments with a practical inexact variant show similar empirical gains over Rennala SGD.
0
0
math.OC 2026-05-11 Recognition

Local LMO matches PGD rates without bounded sets or curvature

Local LMO: Constrained Gradient Optimization via a Local Linear Minimization Oracle

A local linear minimization oracle enables projection-free steps that inherit projected gradient convergence guarantees for unbounded sets

Figure from the paper full image
abstract click to expand
We design Local LMO - a new projection-free gradient-type method for constrained optimization. The key algorithmic idea is to replace the global linear minimization oracle over the constraint set used by Frank-Wolfe (FW) with a local linear minimization oracle over the intersection of the constraint set and a "small" ball centered at the current iterate. In particular, when minimizing $f:\mathbb{R}^d\to \mathbb{R}$ over a constraint $\emptyset\neq\mathcal{X}\subseteq\mathbb{R}^d$, Local LMO performs the iteration \[x_{k+1}\in \arg\min_{z\in\mathcal{X}\cap\mathcal{B}(x_{k},t_k)}\langle\nabla f(x_{k}), z \rangle,\] where $x_0\in\mathcal{X}$, and $t_k>0$ is a suitably chosen radius which can be interpreted as an effective stepsize. While designed as an alternative to FW, Local LMO is perhaps best viewed as a generalization of Gradient Descent (GD) rather than a modification of FW. Indeed, it is easy to see that Local LMO reduces to GD in the unconstrained setting and, more generally, to GD restricted to an affine subspace if the constraint $\mathcal{X}$ is affine. We prove that this simple algorithmic scheme transfers the known (unaccelerated) convergence rates of Projected Gradient Descent (PGD) to the projection-free world in several important regimes, some of which are beyond the reach of FW. In contrast to FW theory, i) our guarantees hold without requiring the feasible set $\mathcal{X}$ to be bounded, ii) our theory does not require the "curvature" assumption, which allows us to establish a standard sublinear rate for convex functions with bounded gradients, iii) we obtain a linear rate in the smooth strongly convex regime. Furthermore, we obtain sharp sublinear rates in the smooth convex and non-convex regimes, in the $(L_0,L_1)$-smooth convex regime, and in stochastic and non-differentiable settings.
0
0
math.OC 2026-05-11 Recognition

Finite-time regulation achieved for uncertain Euler-Lagrange systems

On Composite Adaptive Continuous Finite-Time Control of a class of Euler-Lagrange systems

Composite adaptive laws using DREM estimators and nonlinear PD feedback drive mechanical systems to set-points in finite time.

Figure from the paper full image
abstract click to expand
In this paper, we propose several set-point control schemes for achieving finite-time regulation in a class of Euler--Lagrange systems with $n$ degrees of freedom and uncertain potential energy. The proposed controllers are based on composite adaptive control approaches. Each control scheme consists of two main components: a Proportional--Derivative (PD)-based nonlinear feedback term and a finite-time parameter estimation law. The estimation laws rely on the Dynamic Regressor Extension and Mixing (DREM) technique, which can be designed using either the Kreisselmeier or the least-squares dynamic regressor extensions. These results extend recent advances in finite-time adaptive control for Euler-Lagrange systems. To the best of the authors' knowledge, the composite adaptive control formulation proposed here, which does not employ well-known Slotine and Li adaptive control structure, has not been studied in detail yet. The properties of the proposed controllers are rigorously analyzed using Lyapunov methods. The performance of the controllers is thoroughly evaluated through extensive simulation studies.
0
0
math.OC 2026-05-11 2 theorems

Stationary duality reduces composite counting to simple counting

On the Stationary Duality of Structural Composite Cardinality Optimization

The resulting dual problem has one-to-one local solution matches and global matches when parameters are chosen suitably.

abstract click to expand
Simple cardinality refers to counting nonzero elements of an independent variable satisfying certain properties. Composite cardinality is a simple counting process composited with an affine mapping, and is therefore more complicated than the simple cardinality. We study the composite cardinality optimization problem (CCOP) with structures covering a wide range of applications. Through the use of the stationary duality, we reduce the composite counting to simple counting, and thereby obtain a dual formulation of CCOP. For both primal and dual problems, we investigate the sufficient conditions for the existence of global solutions. Those conditions are validated on representative examples from existing literature. We then show that local solutions of the primal and dual problems are equivalent to their stationary points. This result further helps us establish a one-to-one correspondences between primal and dual local solutions. We also demonstrate that the correspondence holds for a pair of global solutions to the primal and dual problems, provided that the dual weighted parameters are appropriately selected. The reported theoretical results lay foundation for developing numerical algorithms for CCOP in future.
0
0
math.OC 2026-05-11 Recognition

Fast splitting method hits optimal O(1/N²) rate

Optimal Acceleration for Proximal Minimization of the Sum of Convex and Strongly Convex Functions

FDR improves constants from prior work and matches a new lower bound for convex-plus-strongly-convex sums.

abstract click to expand
When minimizing the sum of a convex and a strongly convex function, or when finding the zero of the sum of a monotone operator and a strongly monotone operator, Chambolle and Pock (2010) and Davis and Yin (2015) proposed accelerated mechanisms that achieve an $\mathcal{O}(1/N^2)$ convergence rate for the squared distance to the solution, but the optimality of this rate was not established. In this work, we present Fast Douglas--Rachford Splitting (FDR), an accelerated method that improves the constants established in the prior works, and provide a complexity lower bound establishing that both the $\mathcal{O}(1/N^2)$ convergence rate and the leading-order constant of FDR's rate are optimal.
0
0
math.OC 2026-05-11 2 theorems

Nonconvexity indices for C^{1,1} functions via generalized Hessians

Local Nonconvexity Indices for \(C^{1,1}\) Functions via Generalized Hessians

Replacing the Hessian with the closed convex hull of limiting Hessians produces an interval index that reduces to the smooth case and is nil

abstract click to expand
Davydov, Moldavskaya, and Zitikis introduced local indices for quantifying the lack of convexity of a \(C^2\) function by measuring the nuclear-norm distance of its Hessian from the cone of positive semidefinite matrices. This paper develops a local analogue for functions of class \(C^{1,1}\). At a point \(x\), the classical Hessian is replaced by the Clarke-type generalized Hessian set \(\Hess(h;x)\), defined as the closed convex hull of limiting Hessians at nearby twice differentiability points. Evaluating the same spectral functional over \(\Hess(h;x)\) gives an interval-valued local nonconvexity index whose lower and upper endpoints represent, respectively, the least and greatest visible second-order nonconvexity at \(x\). The construction reduces to the original smooth index when \(h\in C^2\), vanishes for convex \(C^{1,1}\) functions, is invariant under orthogonal changes of variables, satisfies a subadditivity inequality for the upper endpoint under sums, and is upper semicontinuous in its upper endpoint. We also relate the upper endpoint to a pointwise weak-convexity curvature modulus and give explicit \(C^{1,1}\setminus C^2\) examples. The paper is deliberately local in scope: it proposes a scalar diagnostic extracted from generalized Hessian sets, not a replacement for the richer second-order variational theory of nonsmooth convexity.
0
0
math.OC 2026-05-11 2 theorems

Accelerated optimizers' stability certified via SDP

A Unified Lyapunov-IQC Framework for Uniform Stability of Smooth Quadratic First-Order Accelerated Optimizers

Lyapunov-IQC modeling turns the problem into an LMI feasibility check solvable by convex optimization.

abstract click to expand
We develop a unified Lyapunov-integral quadratic constraint (IQC) framework for establishing uniform stability of first-order accelerated optimization algorithms in the $\beta$-smooth and $\gamma$-strongly convex regime. Classical analyses of uniform stability, such as the work of Hardt, Recht, and Singer for stochastic gradient descent (SGD), rely on direct coupling arguments and case-by-case control of iterate differences under random sampling. Extending such arguments to accelerated methods, such as Nesterov Accelerated Gradient (NAG), is complicated by the presence of higher-order state dynamics induced by momentum. We first extend this classical approach with the use of Lyapunov functions to provide a uniform stability bound for smooth quadratic NAG, and supplement this result with small-scale numerical experiments. We then extend this framework by modeling first-order accelerated optimizers as Lur'e-type feedback interconnections between a linear dynamical system and a (non-linear) gradient operator. $\beta$-Smoothness and $\gamma$-strong convexity are encoded a sector IQC inequality. Under this representation, uniform stability is certified via the existence of a quadratic Lyapunov function satisfying a finite-dimensional linear matrix inequality (LMI) in the form of a feasibility problem, which can be solved via semi-definite programming (SDP). We instantiate this framework for NAG and show how classical uniform stability bounds can be recovered via this framework. These results underscore a structural connection between optimization dynamics and robust control theory, providing a modular methodology for reliable and reproducible numerical certification of uniform stability and generalization behavior of first-order methods via convex optimization tools that is adaptable to increasingly complex optimization algorithms.
0
0
math.OC 2026-05-11 2 theorems

Robust Learning Meets Quasar-Convex Optimization: Inexact High-Order Proximal-Point Methods

HiPPA reaches minima with linear or superlinear local rates once inexactness is controlled by the model-value gap.

Figure from the paper full image
abstract click to expand
Robust learning aims to maintain model performance under noise, corruption, and distributional shifts, which are prevalent in modern machine learning applications. This work shows that examples of robust learning problems can be formulated as (strongly) quasar-convex optimization problems, which admit a benign landscape with no saddle points. We then propose HiPPA, an inexact high-order proximal-point method that employs a model-value gap to control the inexactness of subproblem solutions. Notably, we prove global convergence of HiPPA to global minima and establish that it attains a (local) linear or superlinear convergence rate, depending on the regularization order and inexactness control. Our numerical experiments on robust feature-alignment distillation indicate strong empirical performance of HiPPA and results consistent with our theoretical findings.
0
0
math.OC 2026-05-11 2 theorems

Learning method scales nonlinear self-optimizing control to large processes

Generalized Global Self-Optimizing Control for Chemical Processes: Part II Objective-Guided Controlled Variable Learning Approach

OGCVL blends symbolic and numerical techniques to design effective controlled variables while remaining computationally feasible.

Figure from the paper full image
abstract click to expand
Self-optimizing control (SOC) aims to maintain near-optimal process operation by judiciously selecting controlled variables (CVs). In this series of work, the generalized global SOC (g2SOC) approach is proposed, which extends the concept of SOC to the whole operation space and uses general nonlinear functions to design CVs instead of linear combinations. In the first part of this series work, two numerical approaches for g2SOC are proposed: the optimization-based approach and the regression-based approach, based on a theoretical analysis of the existence of perfect self-optimizing CVs. The CVs designed by the former perform better, but are usually infeasible for large-scale problems. In this paper, we propose an algorithm called objective-guided controlled variable learning (OGCVL) that combines the advantages of both and has a better scalability. OGCVL is proposed for efficient CV design that seamlessly integrates symbolic and numerical computation techniques. Finally, the effectiveness of the OGCVL method is verified in two numerical examples. Both examples illustrate show that the OGCVL method is able to achieve good results while maintaining computational efficiency and is also feasible in large-scale problems.
0
0
math.OC 2026-05-11 2 theorems

Only three of fourteen transcription methods pass rocket landing validation

Transcription-Induced Failure Modes in 6-DOF Rocket Landing Trajectory Optimization

An adversarial objective exposes truncation errors and quaternion drift that make most discretizations fail in 6-DOF trajectory optimization

Figure from the paper full image
abstract click to expand
Solving optimal control problems via large-scale NLP solvers depends on discretizing continuous dynamics. Yet, this transcription step hides critical vulnerabilities-most notably truncation error and invariant drift-that can drive solvers toward dynamically infeasible or suboptimal trajectories. To expose these hidden failures, we introduce a problem- and transcription-agnostic adversarial objective that leverages the structure of local truncation-error bounds to aggressively amplify such defects. When applied to a 6-DOF rocket-landing problem, we reveal a stark reliability gap: of fourteen transcription methods tested, only three satisfy rigorous validation criteria. These results also expose a striking performance inversion: even in the absence of classical stiffness, a fourth-order implicit scheme (GL2) matches the fidelity of a sixth-order explicit method (RK6). Using B-series expansions and symplectic Runge-Kutta theorems, we isolate the specific truncation errors and quaternion-invariant drift responsible for these failures. Crucially, these theoretical vulnerabilities dictate operational performance: in practical lateral-divert scenarios, the implicit GL2 consistently outperforms the explicit RK6 in both end-to-end solve speed and robustness.
0
0
math.OC 2026-05-11 Recognition

Approximate directional stationarity is necessary for local minima

Approximate directional stationarity and associated qualification conditions

A single-sequence Mangasarian-Fromovitz type check upgrades it to directional stationarity under nonsmooth geometric constraints.

abstract click to expand
Approximate stationarity conditions provide necessary optimality conditions without requiring additional assumptions by demanding that a perturbed stationarity system possesses solutions as the involved perturbations tend to zero. Together with associated approximate constraint qualifications, which are typically rather mild, they raised much interest in the optimization community during the last decade. In parallel, directional stationarity conditions became quite popular as they sharpen standard stationarity conditions by incorporating data associated with underlying critical directions. The purpose of this paper is twofold. First, we melt the aforementioned concepts of approximate and directional stationarity to formulate and study so-called approximate directional stationarity. For the underlying model problem, an optimization problem with nonsmooth geometric constraints is chosen, which covers diverse practically relevant applications. The role of approximate directional stationarity as a necessary optimality condition is investigated in much detail, complementing results from the literature. Second, we formulate a qualification condition which, based on an approximately directionally stationary point, can be exploited to infer its directional stationarity. The latter condition depends on one particular sequence verifying approximate directional stationarity and merely requires to check a simple condition of Mangasarian--Fromovitz type stated in terms of the directional tools of limiting variational analysis. This contrasts standard approximate constraint qualifications that typically demand a certain stable behavior of all sequences validating approximate stationarity. Throughout, various approaches to verify directional stationarity of local minimizers are established, and illustrative examples are presented to make the theoretical results more accessible.
0
0
math.OC 2026-05-11 2 theorems

Penalty methods reach ε-KKT points for bilevel minimax problems in Õ(ε^{-4}) steps

Penalty-Based First-Order Methods for Bilevel Optimization with Minimax and Constrained Lower-Level Problems

First-order algorithms handle non-strongly-convex lower levels and improve prior bounds for constrained inner problems to the same rate.

Figure from the paper full image
abstract click to expand
We study a class of bilevel optimization problems in which both the upper- and lower-level problems have minimax structures. This setting captures a broad range of emerging applications. Despite the extensive literature on bilevel optimization and minimax optimization separately, existing methods mainly focus on bilevel optimization with lower-level minimization problems, often under strong convexity assumptions, and are not directly applicable to the minimax lower-level setting considered here. To address this gap, we develop penalty-based first-order methods for bilevel minimax optimization without requiring strong convexity of the lower-level problem. In the deterministic setting, we establish that the proposed method finds an $\epsilon$-KKT point with $\tilde{O}(\epsilon^{-4})$ oracle complexity. We further show that bilevel problems with convex constrained lower-level minimization can be reformulated as special cases of our framework via Lagrangian duality, leading to an $\tilde{O}(\epsilon^{-4})$ complexity bound that improves upon the existing $\tilde{O}(\epsilon^{-7})$ result. Finally, we extend our approach to the stochastic setting, where only stochastic gradient oracles are available, and prove that the proposed stochastic method finds a nearly $\epsilon$-KKT point with $\tilde{O}(\epsilon^{-9})$ oracle complexity.
0
0
math.OC 2026-05-11 Recognition

Robust Capacity Expansion under Wildfire Ignition Risk and High Renewable Penetration

It solves for least-cost investments that survive simultaneous worst-case line de-energization and renewable shortfalls.

Figure from the paper full image
abstract click to expand
In power systems, the risk of wildfire ignition has increased significantly in recent years. The impact and severity of these events on energy dispatch, as well as their societal ramifications, make wildfire prevention critical for power system planning and operation. A common intervention by system operators is to de-energize transmission lines to mitigate the risk of fire caused by equipment failures. With the growing integration of variable renewable generation, managing and preparing the system to de-energization under wildfire risk has become even more challenging. In this context, mitigation decisions such as installing battery energy storage systems and undergrounding transmission lines can reduce the risk and adverse effects associated with de-energization and renewable generation variability. This paper presents a robust optimization model to determine the optimal location of battery storage and undergrounding of transmission line investment, utilizing representative weeks and uncertainty sets to capture the temporal relationship of uncertain variables. Specifically, this paper addresses: (i) the worst-case realization of ignition risk leading to the de-energization of transmission lines, combined with the worst-case realization of renewable energy availability, and (ii) the optimal investment decisions for energy storage capacity and undergrounding of transmission lines that are exposed to ignition risk. The proposed model is formulated as a mixed-integer linear programming (MILP) problem, employing duality theory and binary decomposition to address nonlinearities, and is solved using a column-and-constraint generation algorithm. The proposed framework is evaluated on a model of the San Diego power system, demonstrating its practical effectiveness in improving the resilience to wildfire risk.
0
0
math.OC 2026-05-11 2 theorems

Bidirectional compression scales distributed SGD with worker count

Scalable Distributed Stochastic Optimization via Bidirectional Compression: Beyond Pessimistic Limits

Inkheart SGD and M4 achieve improved time complexities that benefit from larger n under a structural assumption needed to bypass lower bound

Figure from the paper full image
abstract click to expand
In centralized, distributed, and federated learning with stochastic gradients and $n$ workers, it was recently shown that it is infeasible to find an $\varepsilon$-stationary point faster than $\tilde{\Omega}\left(\min\left\{\frac{d \kappa L \Delta}{\varepsilon} + \frac{h L \Delta}{\varepsilon} + \frac{h \sigma^2 L \Delta}{n \varepsilon^2},\; \frac{h \sigma^2 L \Delta}{\varepsilon^2} + \frac{h L \Delta}{\varepsilon}\right\}\right)$ seconds in both homogeneous and heterogeneous settings under standard assumptions: $L$-smoothness, $\sigma^2$-bounded unbiased stochastic gradients, and lower boundedness of the function, i.e., $f(x) \geq f^*$ for all $x \in \mathbb{R}^d$, where $\Delta = f(x^0) - f^*$, $h$ is the computation time, $\kappa$ is the communication speed between the workers and the server, and $d$ is the dimension of the iterates and gradients. This result is pessimistic since it does not allow a complexity in which both $\frac{d \kappa L \Delta}{\varepsilon}$ and $\frac{h \sigma^2 L \Delta}{\varepsilon^2}$ improve with $n$, even when using random sparsification techniques; moreover, this lower bound can be matched by either non-distributed SGD or vanilla Synchronous SGD, which reduces the impact of recent progress in the design of compression-based methods. In this work, we challenge this limitation and propose new compressed methods, Inkheart SGD and M4, and show that under an additional structural assumption, which is necessary due to the lower bound and which does not restrict the class of considered problems, we achieve new state-of-the-art time complexities that break this pessimistic barrier and allow scaling with the number of workers $n$.
0
0
math.OC 2026-05-11 2 theorems

Medoid gradients achieve O(1/T) convergence under infinite-variance noise

Robust stochastic first order methods in heavy-tailed noise via medoid mini-batch gradient sampling

R-SGD-Mini splits batches into chunks and selects the medoid one to prove explicit rates for non-convex problems with symmetric heavy tails.

Figure from the paper full image
abstract click to expand
We consider a first order stochastic optimization framework where, at each iteration, $K$ independent identically distributed (i.i.d.) data point samples are drawn, based on which stochastic gradients can be queried. We allow gradient noise to be heavy-tailed, with possibly infinite variances. For the considered heavy-tailed setting, many algorithmic variants have recently been proposed based on gradient clipping or other nonlinear operators (e.g., normalization) applied over noisy gradients. In this paper, we take an alternative approach and propose a novel stochastic first order method dubbed Robust Stochastic Gradient Descent with medoid mini-batch gradient sampling, R-SGD-Mini for short. The core idea of R-SGD-Mini is to split the $K$-sized data batch into $M$ distinct data chunks, form for each chunk the stochastic gradient, and update the solution estimate with respect to the stochastic gradient direction of the chunk that is medoid of gradients of all data-chunks. Under a general class of symmetric heavy-tailed gradient noises and a standard non-convex setting, we establish explicit bounds on the expected time-averaged squared gradient norm. More precisely, we show that the latter quantity converges at rate $\mathcal{O}(T^{-1})$ to a small neighborhood of zero; we explicitly characterize this neighborhood in terms of noise and algorithm's parameters. Moreover, if the time horizon is known in advance, we establish the rate of $\mathcal{O}(T^{-\frac{1}{2}}).$ Furthermore, when clipping is incorporated, we obtain convergence guaranties in the high-probability sense and recover the same rate. Experimental results indicate that R-SGD-Mini and its clipped variant consistently perform favorably compared to SGD, clipped SGD and Median-of-Means based methods.
0
0
math.OC 2026-05-11 2 theorems

Distributed algorithm converges for biased stochastic fixed points

Distributed Seeking for Fixed Points of Biased Stochastic Operators: A Communication Efficient Approach

Surrogate function and relaxed bias conditions enable communication-efficient iterations that also link to non-convex optimization.

abstract click to expand
This paper investigates the distributed fixed point seeking problem of sum-separable stochastic operators over the multi-agent network. Based on inexact Krasnosel'ski\u{\i}--Mann iterations, the communication-efficient distributed algorithm is proposed under the relaxed growth bias and variance conditions, generalizing traditional unbiased and bounded additive variance assumptions. To enhance communication efficiency, we integrate communication compression and dynamic period skipping techniques, particularly adopting a unified compressor that allows both relative and absolute compression errors. By introducing a surrogate function for general non-contractive and contractive operators, we establish convergence guarantees of the distributed fixed point iteration, achieving among the first theoretical unifications with distributed non-convex optimization algorithms. Finally, numerical simulations validate the effectiveness of the theoretical results.
0
0
math.OC 2026-05-11 Recognition

Covariates adapt uncertainty sets to cut conservatism in power dispatch

Data-Driven Contextual-Aware Uncertainty Set for Robust Dispatch of Power Systems

A conditional Gaussian mixture model uses side information to tailor uncertainty sets to irregular distributions while preserving robustness

Figure from the paper full image
abstract click to expand
Both the level of conservativeness and the computational burden in robust optimization are critically influenced by uncertainty set design. However, contextual side information is rarely exploited in robust dispatch of power systems characterized by irregular data distributions, which hinders the explicit characterization of the relationship between covariates and uncertain parameters. To address this issue, a data-driven method for constructing contextual-aware uncertainty set is proposed in this letter. Based on a conditional Gaussian mixture model, a set of covariates is leveraged as side information to design uncertainty sets tailored to historical data exhibiting irregular distributions. The resulting set is formulated as a union-of-subsets formulation, and a mixed integer linear reformulation is adopted to describe the worst-case realization across all subsets. Finally, the effectiveness of the proposed method is demonstrated through numerical experiments applied to robust unit commitment.
0
0
math.OC 2026-05-11 Recognition

Faster pursuer captures evader in finite time on ellipse

Dynamical Systems in Elliptical Pursuit and Evasion

Elliptical pursuit yields explicit capture bounds when pursuer is quicker and global convergence to a periodic orbit when slower.

Figure from the paper full image
abstract click to expand
This paper investigates the difference between the circular and elliptical cases in one-on-one pursuit and evasion problems. Using the simultaneous differential equation derived by Barton and Eliezer, we derive a dynamical system based on the assumption that the shape of the pursuer's trajectory is unaffected by the evader's speed. The dynamical system involves the angular difference between the velocity vectors of the players and their separation distance. When the evader orbits a circle, the dynamical system is autonomous with an asymptotically stable equilibrium point. By contrast, if the evader orbits an ellipse, the dynamical system becomes non-autonomous and lacks an equilibrium point. To handle the singularity at capture, we reformulate the system using a complex variable that includes information about the logarithmic distance and the angular difference. We establish two main results: when the pursuer is faster than the evader, the pursuer captures the evader in finite time, and we derive an explicit upper bound for the capture time; when the pursuer is slower, the system possesses a unique periodic solution to which all trajectories converge globally and asymptotically.
0
0
math.OC 2026-05-11 1 theorem

Coderivative conditions ensure stability of conic Nash equilibria

Stability of Lagrangian Generalized Nash Equilibriums

Characterizations via normal cone mappings show when small data changes preserve equilibrium sets in generalized games.

abstract click to expand
Lagrangian generalized Nash equilibriums (LGNEs) were introduced by Rockafellar (2024) for a class of generalized Nash equilibrium problems (GNEPs) in which each player's strategy is subject to conic constraints. This paper investigates the stability properties of the LGNE solution set, specifically focusing on the Aubin property, isolated calmness, and Lipschitz continuous single-valued localization. For general conically constrained GNEPs, characterizations of the Aubin property and isolated calmness of the LGNE solution mapping under canonical perturbations are established. These characterizations are formulated using the coderivative and graph derivative of normal cone mappings. Subsequently, these general results are specialized to GNEPs with equality and inequality constraints, yielding explicit characterizations for both the Lipschitz continuous single-valued localization and isolated calmness of the corresponding LGNE solution mapping, which are described by nonsingularity of linear complementarity sytems. For GNEPs with shared conic constraints, the Aubin property and isolated calmness of the consensus LGNE solution mapping--where identical Lagrange multipliers are assigned to the shared constraint--are first characterized. We further analyze the case when the conic constraints are specialized as equalities and inequalities. Finally, for classical conically constrained Nash equilibrium problems, the Aubin property and isolated calmness of the Lagrangian Nash equilibrium solution mapping are also analyzed.
0
0
math.OC 2026-05-11 2 theorems

Prox-PEP hits stationarity at O(T^{-1/4}) rate for weakly convex stochastic problems

Prox-PEP: A Proximal Partial Exact Penalty Algorithm for Weakly Convex Stochastic Nonlinear Programming

Quadratic approximations and a dynamic penalty keep subproblems strongly convex, delivering both expected and high-probability bounds on the

abstract click to expand
This paper considers stochastic optimization problems with weakly convex objective and constraint functions. We propose Prox-PEP, a proximal method equipped with quadratic subproblems. To handle nonlinear equality constraints, we employ an exact penalty approach, transforming them into inequality constraints with auxiliary slack variables. At each iteration, we construct quadratic approximations for both the objective and the constraint functions to facilitate efficient subproblem computation. By carefully designing the second-order approximation matrices, the subproblem constructed via the augmented Lagrangian function is strictly guaranteed to be strongly convex. Furthermore, we adopt a dynamic strategy for the equality penalty parameter: it monotonically increases up to a predefined threshold and remains constant thereafter. Building upon this algorithmic framework, we establish comprehensive asymptotic complexities. We prove that Prox-PEP achieves an $\mathcal{O}(T^{-1/4})$ average expected oracle complexity for $\epsilon$-KKT stationarity, specifically bounding the squared norm of the gradient of the Moreau envelope of the Lagrangian function, alongside constraint violations and complementarity conditions. Additionally, under standard light-tailed martingale noise assumptions, we derive an $\mathcal{O}(T^{-1/8})$ high-probability convergence bound for the norm of the gradient of the Lagrangian's Moreau envelope, as well as $\mathcal{O}(T^{-1/4})$ high-probability bounds for both constraint violations and complementarity conditions.
0
0
math.OC 2026-05-11 2 theorems

LS-SVM reformulation enables CV-based feature selection for SVMs

Cross-validation-based optimal feature selection for linear SVM classification

A bilevel problem reduces to a standard mixed-integer program that matches or beats L1 regularization and recursive elimination on accuracy.

abstract click to expand
This paper addresses feature subset selection for Support Vector Machines (SVMs) based on the cross-validation criterion. Unlike statistical criteria such as the Akaike information criterion (AIC) and the Bayesian information criterion (BIC), cross-validation requires only the mild assumption that samples are independently and identically distributed (i.i.d.). For this reason, the cross-validation criterion is expected to work well across a wide range of prediction problems, and it has already demonstrated its usefulness as a feature subset selection method for regression. The objective of this paper is to extend the framework of best feature subset selection via the cross-validation criterion to SVM classification problems. This subset-selection problem can be formulated as a bilevel mixed-integer optimization problem. Because bilevel optimization problems are generally hard to solve, we introduce the Least Squares Support Vector Machine (LS-SVM), whose optimality conditions admit a closed-form expression, and reduce the problem to a single-level mixed-integer optimization problem. This reformulation allows us to solve the problem using standard optimization software. We evaluate the proposed framework through simulation experiments that compare it with a regularization-based method (L1-regularization), a sequential search method (recursive feature elimination), and mixed-integer optimization (MIO) based on statistical criteria. The results show that the proposed framework achieves favorable performance both in classification accuracy and feature selection accuracy.
0
0
math.OC 2026-05-08 Recognition

Newton steps for constrained control reduce to reweighted Riccati solves

A Semi-smooth Newton Method for the Constrained Optimal Control of Continuous-Time Linear Systems

Embedding KKT conditions in a complementarity function lets a semi-smooth Newton method handle continuous-time linear systems with superlin

Figure from the paper full image
abstract click to expand
This paper details a novel indirect method for solving constrained optimal control problems (OCPs) directly in continuous-time function space. The KKT conditions are embedded in a non-smooth complementarity function, which enables their reformulation as a rootfinding problem in Banach space. This problem is then solved using a non-smooth Newton method. Finally, the paper shows that the Newton update can be obtained by solving a modified differential Riccati equation, where the cost terms are reweighted at every iteration based on the constraint multipliers. Numerical simulations show the effectiveness of the method, which converges superlinearly up to the tolerance of the ODE solver.
0
0
math.OC 2026-05-08 Recognition

Dynamic prices cut synchronized peaks from automated homes

Unlocking Deep Demand Flexibility via Dynamic Signals

Aggregate feedback alone revises day-ahead rates and lowers both peak demand and daily variation on realistic feeders.

Figure from the paper full image
abstract click to expand
The rapid proliferation of distributed energy resources (DERs) and the electrification of residential loads offer significant potential for grid flexibility but pose stability challenges under static pricing regimes. Specifically, high levels of automation under static Time-of-Use (TOU) tariffs often induce ``device synchronization,'' where simultaneous responses from home energy management systems (HEMS) create artificial demand peaks that threaten grid stability. This paper proposes a privacy-preserving, one-way dynamic signaling framework to unlock deep demand flexibility from HEMS. We utilize a feedback-based learning algorithm that updates day-ahead price profiles based on aggregate substation demand and environmental contexts, effectively closing the loop between utility objectives and aggregated edge behaviors. The framework is rigorously validated using high-fidelity simulations on an 84-bus distribution network populated with hundreds of HEMS controlling diverse devices, including HVAC, PV, batteries, and flexible loads. Results demonstrate that the proposed mechanism achieves substantial reductions in both peak demand and total load variation. Extensive analyses across diverse climates and scalable deployments confirm the framework's robustness, indicating that dynamic pricing acts as a force multiplier for DERs, with peak shaving potential increasing significantly under high renewable penetration scenarios.
0
0
math.OC 2026-05-08 2 theorems

Relaxation makes dynamic auto-insurance pricing asymptotically optimal at scale

Prescriptive Optimization for Adaptive Auto-insurance Pricing with Telematics Data

Lagrangian decomposition turns the portfolio problem into independent driver systems whose duality gap vanishes with size, outperforming静态定价

Figure from the paper full image
abstract click to expand
Usage-based insurance (UBI) uses telematics to align premiums with risk and encourage safe driving. However, deploying these programs is challenging due to heavy-tailed claim costs, nonstationary driver behavior, and limited incentive budgets. While existing research focuses on profiling drivers, prescriptive pricing remains underexplored. We propose an optimal control framework that integrates telematics directly into dynamic pricing. Our approach (i) learns claim frequency and severity, (ii) models multi-period behavioral evolution in response to discounts, and (iii) optimizes portfolio-wide discount allocation using a Lagrangian relaxation. This decomposes the non-convex centralized problem into independent dynamical systems. We theoretically prove this relaxation's duality gap vanishes as the portfolio scales, guaranteeing asymptotic optimality. We validate our approach computationally on a simulated industry-scale portfolio. Our results demonstrate not only the computational tractability of our approach but also that it outperforms static baselines, reducing both expected losses and claim probabilities to benefit insurers and policyholders alike.
0
0
math.OC 2026-05-08 Recognition

Auxiliary loss replaces squared gradients with cheap Hessian approximations

Low-Order Explicit Hessian Imitation Method for Large-Scale Supervised Machine Learning

The resulting optimizer keeps Adam-level cost per step, inherits matching convergence guarantees, and beats Adam on some supervised learning

Figure from the paper full image
abstract click to expand
An algorithm is proposed for solving optimization problems arising in neural network training for supervised learning. The unique feature of the algorithm is the use of an auxiliary loss, in addition to the original loss employed for model training. The purpose of the auxiliary loss is to provide a mechanism for creating a low-order Hessian-type approximation for the original loss. The proposed algorithm employs the resulting low-order second-derivative approximation terms in place of the second-order momentum terms (i.e., squared elements of the gradient of the loss function) in an overall scheme that has computational cost on par with an Adam-type approach. Whereas the squared elements of a gradient vector do not necessarily approximate second-order derivatives well, by careful construction of the auxiliary loss, second-order derivative-type approximations for the original loss can be computed and employed by the algorithm in an efficient manner. A convergence guarantee is provided for the proposed algorithm that is on par with guarantees available for similar stochastic diagonal-scaling methods. The results of numerical experiments show situations when the proposed algorithm outperforms Adam and other popular modern optimizers.
0
0
math.OC 2026-05-08 2 theorems

Muon attains optimal rates for heavy-tailed matrix optimization

Muon with Nesterov Momentum: Heavy-Tailed Noise and (Randomized) Inexact Polar Decomposition

With Nesterov momentum and inexact polar factors, the method reaches ε-stationary points in O(ε^(-(3α-2)/(α-1))) steps under heavy tails.

Figure from the paper full image
abstract click to expand
Most first-order optimizers treat matrix-valued parameters as vectors, ignoring the intrinsic geometry of hidden-layer weights in neural networks. Muon addresses this mismatch by updating along the polar factor of a momentum matrix, but its theoretical understanding has lagged behind practice. In particular, practical implementations incorporate Nesterov momentum, compute the polar factor only approximately, and operate with stochastic gradients that may be heavy-tailed. We close this gap by developing a convergence theory for Muon with Nesterov momentum and inexact polar decomposition in non-convex matrix optimization under heavy-tailed noise. Our analysis builds on a unified framework for inexact polar decomposition that captures practical iterative approximations such as Newton-Schulz and quantifies how their errors propagate through the optimization dynamics. Under this framework, we establish an optimal iteration and sample complexity of $O \left(\varepsilon^{\frac{-(3\alpha-2)}{(\alpha-1)}} \right)$ for finding an $\varepsilon$-stationary point, where $\alpha\in(1,2]$ denotes the heavy-tail index. For the inexact-polar setting with $\sigma_1=0$, we also provide guarantees that do not require prior knowledge of $\alpha$. We analyze a randomized low-rank polar decomposition that is substantially more efficient than full-space methods while remaining compatible with our theory. Numerical experiments further demonstrate the effectiveness of the proposed inexact and randomized variants.
0
0
math.OC 2026-05-08 2 theorems

Generic atoms yield simplicial faces in the pseudo-moment cone

Simplicial Regularizability of the Pseudo-Moment Cone and Carath\'eodory-Type Atomic Decomposition of Moment Matrices

This geometry enables an efficient Carathéodory-style algorithm to recover the atoms from O(n^d) moment matrices.

Figure from the paper full image
abstract click to expand
We study the facial geometry of the homogeneous pseudo-moment cone \(\Sigma_{n,2d}^*\) and its implications for atomic decomposition of moment matrices. For fixed \(d \ge 2\), we show that if a moment matrix is formed by \(O(n^d)\) generically chosen weighted atoms, then its minimal face in the matrix realization of the pseudo-moment cone is \emph{simplicial} and generated by the planted rank-one atoms. Based on this geometric result, we develop a Carath\'eodory-type extreme-ray decomposition algorithm for spectrahedral cones and show that, when specialized to the pseudo-moment cone, it yields an efficient atomic decomposition method for generically generated moment matrices in the same regime. A stabilized numerical implementation demonstrates strong recovery performance and suggests that, outside the guaranteed regime, the algorithm may serve as a practical sampler of high-rank extreme rays.
0
0
math.OC 2026-05-08

Multiple trust regions improve high-dimensional Bayesian optimization

MTRBO: Multiple trust-region based Bayesian optimization

MTRBO allocates separate regions for exploitation and exploration, proves global convergence, and beats prior methods on non-convex testbeds

abstract click to expand
Bayesian Optimization (BO) is a popular framework for optimizing black-box functions. Despite its effectiveness, BO is often inefficient for high-dimensional problems due to the exponential growth of the search space, heterogeneity of the objective function, and low sampling budget. To overcome these issues, this work proposes a multiple trust region-based Bayesian optimization technique(MTRBO). A trust region is a localized region within which an optimization model is trusted to approximate the objective function accurately. Assuming a Gaussian process (GP) as a prior belief about the objective function and based on the posterior mean and variance functions, the method adaptively exploits near the promising current solution inside a trust region. Also explores the most uncertain region in the search space inside another trust region. The theoretical global convergence property of the proposed method is established. Then the work is benchmarked against other state-of-the-art trust-region-based Bayesian optimization algorithms, demonstrating superior performance on a variety of non-convex and high-dimensional test functions. The proposed method outperforms others in terms of solution quality within the sampling budget (the number of function evaluations). The proposed method is applied to the portfolio optimization problem to verify its applicability in real-world scenarios.
0
0
math.OC 2026-05-08

Single-query ZO reaches Goldstein points at O(d² δ^{-3} ε^{-3}) cost

Stochastic Non-Smooth Non-Convex Optimization with Decision-Dependent Distributions

The bound holds for non-smooth non-convex objectives whose sampling distribution shifts with the decision and requires only one noisy value,

Figure from the paper full image
abstract click to expand
We study stochastic zeroth-order optimization with decision-dependent distributions, where the sampling law depends on the current decision and only noisy function values are available. For the non-smooth non-convex setting, we establish an explicit convergence guarantee for finding a $(\delta,\epsilon)$-Goldstein stationary point with stochastic zeroth-order oracle (SZO) complexity of $\mathcal{O}(d^2\delta^{-3}\epsilon^{-3})$. In addition, we show that the above complexity can be achieved with single SZO feedback per iteration. We further extend the analysis to smooth and Hessian-Lipschitz objectives, obtaining complexities $\mathcal{O}(d^2\epsilon^{-6})$ and $\mathcal{O}(d^2\epsilon^{-9/2})$, respectively. In the Hessian-Lipschitz case, this improves the best-known dependence on $\epsilon$ for decision-dependent zeroth-order methods by a factor of $\epsilon^{-1/2}$.
0
0
math.OC 2026-05-08

RL learns to pick cuts that speed Benders decomposition

Learning to Cut: Reinforcement Learning for Benders Decomposition

Policy gradient training yields faster solves than standard or supervised methods on electric vehicle problems and generalizes to new sizes.

Figure from the paper full image
abstract click to expand
Benders decomposition (BD) is a widely used solution approach for solving two-stage stochastic programs arising in real-world decision-making under uncertainty. However, it often suffers from slow convergence as the master problem grows with an increasing number of cuts. In this paper, we propose Reinforcement Learning for BD (RLBD), a framework that adaptively selects cuts using a neural network-based stochastic policy. The policy is trained using a policy gradient method via the REINFORCE algorithm. We evaluate the proposed approach on a two-stage stochastic electric vehicle charging station location problem and compare it with vanilla BD and LearnBD, a supervised learning approach that classifies cuts using a support vector machine. Numerical results demonstrate that RLBD achieves substantial improvements in computational efficiency and exhibits strong generalization to problems with similar structures but varying data inputs and decision variable dimensions.
0
0
math.OC 2026-05-08

Vectorized formulation linearizes causality for batch SOC

Global self-optimizing control of batch processes

Reformulation converts dynamic constraints to linear ones, enabling shortcut selection of simple repetitive measurement combinations.

Figure from the paper full image
abstract click to expand
This work considers to achieve near-optimal operation for a class of batch processes by employing self-optimizing control (SOC). Comparing with a continuous one, a batch process exhibits stronger nonlinearity with dynamics because of the non-steady operation condition. This necessitates a global version of SOC to achieve satisfactory performance. Meanwhile, it also makes the existing global SOC (gSOC) not directly applicable to batch processes due to the causality amongst variables. Therefore, it is necessary to extend the original gSOC to batch processes. In addition to the nonconvexity challenge of the original gSOC problem, the new extension for batch processes has to face even more challenges. Particularly, the causality due to dynamics of batch processes brings in structural constraints on controlled variables (CVs), making a CV selection problem even more difficult. To address these challenges, the gSOC problem is recast in a vectorized formulation and it is proved that the structural constraints considered are linear in the vectorized formulation. Moreover, a novel shortcut method is proposed to efficiently find sub-optimal but more transparent solutions for this problem. The effectiveness of the new approach is validated through a case study of a fed-batch reactor, where CVs are constructed through a combination matrix with a repetitive structure, resulting in a simple SOC scheme. This simplicity facilitates the implementation of the SOC approach and enhances its practical applicability and robustness.
0
0
math.OC 2026-05-08 Recognition

Adversary injection makes unobservable quadratic systems observable

Confidentiality of Linear Control Systems with Quadratic Output Under Sensor Attacks [Extended Version]

By adding a crafted signal to sensor measurements, an attacker enables local exponential state estimation in systems that are otherwise un-0

Figure from the paper full image
abstract click to expand
We study the state estimation problem for linear control systems with quadratic outputs which are locally unobservable at the equilibrium. We show that, despite this inherent lack of observability, an adversary with sensor read and write capability can induce observability by injecting an appropriate signal into the measurement channel. Taking the role of an adversary, we characterize when an injected signal can or cannot induce observability and, in the successful case, construct an observer that achieves local exponential convergence of state estimates to the true states of the system. A simulation study demonstrates our results.
0

browse all of math.OC → full archive · search · sub-categories