Multi-objective optimization is central to many engineering and machine learning applications, where multiple objectives must be optimized in balance. While multi-gradient based optimization methods combine these objectives in each step, such methods require computing gradients with respect to all variables at every iteration, resulting in high computational costs in large-scale settings. In this work, we propose a framework that simultaneously alternates the optimization of each objective and the (stochastic) gradient update with respect to each variable block. Our framework reduces per-iteration computational cost while enabling exploration of the Pareto front by allocating a prescribed number of gradient steps to each objective. We establish rigorous convergence guarantees across several stochastic smooth settings, including convex, non-convex, and Polyak-Lojasiewicz conditions, recovering classical convergence rates of single-objective methods. Numerical experiments demonstrate that our framework outperforms non-alternating methods on multi-target regression and produces a competitive Pareto front approximation, highlighting its computational efficiency and practical effectiveness.
Diffusion models are routinely guided in practice by combining multiple score fields, yet the mathematical structure of score mixing is still poorly understood. We study the small-time generation dynamics driven by mixed scores $$ s=\lambda\,\nabla\log u_1+(1-\lambda)\,\nabla\log u_2,\qquad \lambda\ge 0, $$ in the heat-flow framework, where $u_1,u_2$ are heat evolutions of two compactly supported probability measures. This single formulation covers both the mixture-of-experts regime $(0\leq \lambda\leq 1)$ and the classifier-free guidance regime $(\lambda>1)$. Exploiting a Laplace-Varadhan principle under a similarity-time rescaling, we show that the small-time generation dynamics is governed by the explicit geometric potential $$ \Phi_\lambda=\lambda d_1^2+(1-\lambda)d_2^2, $$ which depends only on the supports of the initial measures and on the mixing parameter. This gives a rigorous reduction from a singular, non-autonomous score-driven dynamics to autonomous Clarke-type subgradient inclusions. In the empirical setting of finite Dirac mixtures, the limiting potential is piecewise quadratic with a Voronoi-type structure; this rigidity yields convergence of all autonomous limiting trajectories to critical points and a conditional convergence criterion for the original generation flow toward local minimizers of the potential, with rate $\mathcal O(\sqrt t)$ in the smooth stable case.
Hessian-driven damping in second-order primal-dual dynamics reduces oscillations while guaranteeing strong convergence under time-varying Tâ
abstractclick to expand
In this paper, we propose a class of general second-order primal-dual dynamical systems with Tikhonov regularization and Hessian-driven damping for solving convex-concave bilinear saddle point problems. The proposed dynamical system incorporates five general time-varying terms: viscous damping, time scaling, extrapolation, Tikhonov regularization, and Hessian-driven damping parameters. Under suitable parametric conditions, we analyze the asymptotic convergence properties of the dynamical system by constructing appropriate Lyapunov functions. Specifically, we obtain the convergence rate of the primal-dual gap and the boundedness of trajectories in the proposed dynamical system, and provide some integral estimates. Furthermore, we theoretically prove that the trajectories generated by the dynamical system converge strongly to the minimum-norm solution of the saddle point problem, and fully demonstrate that Hessian-driven damping can effectively alleviate oscillations. Finally, numerical experiments are conducted to verify the validity of the above theoretical results.
Moment-QSOS relaxations formulated directly in quaternions give monotonic lower bounds and scale via sparsity.
abstractclick to expand
This paper introduces a Moment-Quaternion-Sum-of-Squares (QSOS) hierarchy for solving a class of quaternion polynomial optimization problems. This hierarchy is formulated directly in the quaternion domain and consists of a sequence of semidefinite programming (SDP) relaxations that provide monotonic lower bounds on the optimal value. To improve scalability, we incorporate correlative sparsity, which can significantly reduce the size of the resulting SDPs for large-scale sparse problems. Furthermore, we introduce a strengthened QSOS relaxation, which enhances the tightness of the standard relaxation by enlarging the monomial basis in a controlled manner. Our various Numerical experiments show that our approach provides comparable bounds to existing approaches, while significantly reducing computation time and memory usage. In particular, applications to the quaternion-based maximum margin criterion problem and the classical orientation synchronization problem illustrate the practical effectiveness of the framework.
We analyze the problem of stochastic optimal control of SDEs where the driver includes a self-exciting stochastic process. Due to the non-Markovian nature of the problem, we apply the stochastic maximum principle approach. We derive a sufficient stochastic maximum principle under this framework. We also derive an expression via martingales of both the self-exciting process and its quadratic covariation. Furthermore, we derive a necessary maximum (equivalence principle) for the self-exciting stochastic control problem. Finally, we look at an application to log-utility.
The design adapts the reference for wave delays in the nonlinear PDE-ODE model, keeping tracking error inside a pre-set funnel as shown in 1
abstractclick to expand
We study the output tracking problem for a vertically driven drill string system described by a nonlinear boundary-coupled PDE-ODE model. Solvability analysis of the drill string model is achieved by first casting the model in an abstract boundary value problem involving set-valued operators on an appropriate Hilbert space. The governing equation here consists of evolution and the damping part. Existence of solutions is established within the framework of maximal monotone operators where one first proves that the evolution operator is a linear skew-adjoint operator and the distributed damping term is a Nemytskii relation which is then proven to be maximal monotone. Maximal monotonicity of the combined operator is then a consequence of Rockafellar's theorem. Furthermore, we propose a novel funnel control design that ensures the angular velocity of the drill bit follows a dynamically adjusted reference trajectory, while the tracking error remains confined within a pre-specified performance funnel. The reference adjustment mechanism adapts in response to large wave traveling times that may cause performance degradation. The corresponding feasibility result is illustrated by some simulations.
We develop a two-stage stochastic multi-commodity flow model to design a resilient maritime energy supply network under correlated chokepoint disruptions. A planner selects strategic inventories and infrastructure activations prior to uncertainty resolution, then routes crude oil, LNG, LPG, and fertilizer through a capacitated network. Three features distinguish this model: disruption scenarios are \emph{correlated}, reflecting the reality that proximate chokepoints (e.g., Hormuz, Bab el-Mandeb) fail jointly; scenario probabilities depend endogenously on first-stage decisions via affine distortion, formalizing \emph{risk exposure through utilization}; and a mean-CVaR objective mitigates tail-risk shortages. Methodologically, the decision-dependent probability model admits an exact MILP reformulation via McCormick linearization. CVaR preserves scenario-wise decomposability, and our Benders decomposition with corridor-based group-failure cuts converges finitely. The model is calibrated to Indian maritime energy imports (16 nodes, 28 arcs) using EIA, UNCTAD, World Bank, and operational data from the 2026 Hormuz crisis. Benders recovers the extensive-form optimum for scenario sizes up to $|S|=729$ with a constant iteration count (10-11). Empirically, the value of the stochastic solution (VSS) is 14.8%; the value of decision-dependent probabilities (VEP) ranges from 0.93% to 8.18%. The mean-CVaR frontier exhibits a design phase transition at confidence level $\alpha\approx 0.75$. Notably, the value of modeling correlation is identically zero across stress tests: the network's diversified portfolio absorbs joint-corridor disruptions using the same hedging mechanisms as single-corridor disruptions (\emph{structural joint-failure resilience}). Finally, LPG emerges as the most exposed commodity, whereas crude oil is fully hedgeable via reserves and pipeline bypasses.
This paper addresses the practical challenge in Entropic Optimal Transport (EOT) where the underlying ground cost function is typically latent and unobserved. Rather than assuming a fixed geometric cost, we adopt a data-driven approach where a shared cost is revealed only through samples from a reference optimal coupling. The question is then: given samples from a reference optimal coupling, can we recover the optimal coupling for new marginals under the same latent cost? We introduce a generative transfer framework that recovers the optimal coupling for new marginals by utilizing an iterative path-wise tilting algorithm. Unlike static importance reweighting, this method evolves the coupling jointly with a marginal transport path, allowing mass to move beyond the reference support. We derive sample-level learning rules for these infinitesimal updates, which yield covariance-type evolution equations for the associated transport vector fields. By integrating this dynamics with Conditional Flow Matching (CFM), we produce a practical sampler for paired data. Finally, we provide theoretical guarantees establishing a global convergence rate of \mathcal{O}(\delta), ensuring the generated coupling converges to the target EOT plan in W_1 distance.
Efficient methods for non-convex black-box optimization largely rely on sampling. In this context, the Zeroth-Order Proximal Operator (ZOPO) and the corresponding Zeroth-Order Proximal Point Algorithm (ZOPPA) have attracted significant interest, as they combine the advantage of requiring only objective evaluations with the powerful theoretical framework of proximal point algorithms. ZOPO depends on a temperature parameter which, when going to zero, reduces ZOPO to the exact proximal operator. By exploiting this property, the vanishing-temperature regime has been leveraged in several works to obtain theoretical guarantees via inexact proximal methods. However, this regime is computationally unsustainable when sampling is used to estimate ZOPO, since the corresponding Monte Carlo estimators suffer from severe variance issues. We therefore propose a comprehensive analysis of ZOPO for any fixed positive temperature, and prove convergence of ZOPPA under minimal assumptions on the objective function. We do so by demonstrating that ZOPPA can be interpreted as an exact proximal point method applied to an auxiliary smoothed objective, rather than an inexact method on the original function. Importantly, we further derive explicit guarantees connecting this smoothed problem back to the original objective and establish convergence results for the sampled method (S-ZOPPA) at a fixed temperature.
In this work, we develop proximal preconditioned gradient methods with a focus on spectral gradient methods providing a proximal extension to the Muon and Scion optimizers. We introduce a family of stochastic algorithms that can handle a wide variety of convex and nonconvex constraints and study its convergence under heavy-tailed noise, through a novel analysis tailored to the geometry of the proposed methods. We further propose a variance-reduced version, which achieves faster convergence under standard noise assumptions. Finally, we show that the polynomial iterations used in Muon are more accurately captured by a nonlinear preconditioner than by the ideal matrix sign, leading to a convergence analysis that more faithfully reflects practical implementations.
We introduce a proximal limited--memory quasi--Newton scheme for minimizing the sum of a continuously differentiable function and a proper, lower semicontinuous and prox-bounded, possibly nonsmooth, function. Both functions might be nonconvex. The method builds upon the computation of scaled proximal operators and is globalized by adaptively updating a regularization parameter based on a criterion of sufficient decrease. We prove global convergence under mild assumptions and then establish convergence of the entire sequence (with rates) under the Kurdyka--Lojasiewicz property. To efficiently solve the subproblems, we exploit the compact representation of limited-memory quasi-Newton updates. We derive also a compact representation of the limited--memory Kleinmichel formula, a rank-one quasi-Newton scheme that preserves positive definiteness under the same condition as the BFGS update. Numerical results show a significant speed up compared to other methods.
An observability inequality is proven for singular metrics modeling acoustic waves in planets by reducing to the separable case and using In
abstractclick to expand
We study the observability of waves on gas giant manifolds which are a class of Riemannian manifolds whose metrics are singular at the boundary. Such manifolds arise naturally in modeling of acoustic wave propagation in gas giant planets.We establish an observability inequality using full boundary measurements given by a Neumann-type trace that is natural in the gas giant setting. The proof proceeds in two steps. First, observability for a general gas giant metric is reduced to the so-called separable case via a perturbation argument. In the separable case, we employ a uniform-in-tangential-frequency analysis combined with an Ingham inequality to prove observability.
Training a deep neural network with the outputs of selected layers satisfying linear constraints is required in many contemporary data-driven applications. While this can be achieved by incorporating projection layers into the neural network, its end-to-end training remains challenging due to the lack of rigorous theory and efficient algorithms for backpropagation. A key difficulty in developing the theory and efficient algorithms for backpropagation arose from the nonsmoothness of the solution mapping of the projection layer. To address this bottleneck, we introduce an efficiently computable HS-Jacobian to the projection layer. Importantly, we prove that the HS-Jacobian is a conservative mapping for the projection operator onto the polyhedral set, enabling its seamless integration into the nonsmooth automatic differentiation framework for backpropagation. Therefore, many efficient algorithms, such as Adam, can be applied for end-to-end training of deep neural networks with linear constraints. Particularly, we establish convergence guarantees of the HS-Jacobian based Adam algorithm for training linearly constrained deep neural networks. Extensive experiment results on several important applications, including finance, computer vision, and network architecture design, demonstrate the superior performance of our method compared to other existing popular methods.
We study bilevel optimization with a fixed polyhedral lower feasible set. Such problems are challenging for two reasons: active-set changes can make the upper objective nonsmooth, and existing hypergradient methods typically require lower-Hessian inversions or equivalent linear solves, which are computationally expensive. To address these issues, we adopt a logarithmic barrier smoothing of the lower problem to obtain a differentiable approximation of the constrained bilevel objective, and develop a proxy-gradient algorithm for the resulting barrier-smoothed surrogate. The algorithm uses only gradients of the upper and lower objectives; its only second-order object is the explicit logarithmic barrier Hessian determined by the fixed polyhedral constraints. Barrier smoothing restores differentiability, but Euclidean smoothness constants are not uniformly bounded near the boundary. We therefore develop a local Dikin-geometry analysis in which the barrier-metric provides an oracle-free curvature scale near the moving lower centers. This leads to barrier-aware schedules that keep the iterates inside locally well-behaved regions. For the barrier-smoothed objective, we prove stationarity rates of $\widetilde{O}(K^{-2/3})$ in the deterministic setting and $\widetilde{O}(K^{-2/5})$ under upper-level-only bounded stochastic noise after $K$ outer iterations, together with quantitative bias control as the barrier parameter decreases.
Small defect components let the convex relaxation decompose into exact low-dimensional problems and recover a unique rank-one solution in O(
abstractclick to expand
We study the doubly nonnegative (DNN) relaxation of the standard quadratic optimization problem \[ \min\{x^\top Qx:\ x\in\Delta^{n-1}\},\qquad \Delta^{n-1}:=\{x\in\mathbb{R}_+^n:\ \mathbb{1}^\top x=1\}, \] for random symmetric matrices with independent diagonal and off-diagonal entries. Let $m_n:=\min_{1\le i\le n} Q_{ii}$ and set $M:=Q-m_nE$, where $E$ is the all-ones matrix. The negative off-diagonal entries of $M$ define a defect graph $G_n^-$. Under entrywise independence, absolute continuity, and the tail-decay condition $n^5\mathbb{E}[F_O(m_n)^4]\to 0$, where $F_O$ is the off-diagonal distribution function, we prove that with probability tending to one every defect component has size at most $4$. On this event, the shifted DNN value decomposes over defect components. Since the DNN and completely positive cones coincide in dimensions at most four, each local relaxation is exact. A finite KKT-candidate argument gives local uniqueness, and absolute continuity rules out ties, so the global DNN optimizer is unique and rank one. The graph estimate uses the fact that every connected component of size at least five contains a tree on exactly five vertices. For Gaussian orthogonal ensemble data, we prove the explicit bound \[ \mathbb{P}\bigl(\text{the DNN optimizer is unique and rank one}\bigr) \ge 1-K\frac{(\ln n)^2}{n^3}. \] On the same event, the exact optimizer can be recovered in $O(n^2)$ time by constructing the defect graph and solving constant-size local KKT systems. We also verify the tail condition for variance-tuned Gaussian Wigner models, heavy-tailed laws, and finite-lower-endpoint laws.
We study the least-energy way to reshape a probability distribution when motion is constrained to a horizontal bundle, that is, optimal transport and distribution steering in sub-Riemannian geometry, motivated by density control over underactuated systems. To obtain a continuous and numerically tractable formulation, we introduce an entropic regularization by adding small noise aligned with the control directions and study the associated Schrodinger bridge problem. The resulting reference process is a degenerate diffusion on the sub-Riemannian manifold. Under bracket-generating hypotheses we obtain smooth, strictly positive transition densities and a forward--backward characterization of the optimal bridge. This leads to a practical Sinkhorn-type algorithm for the Schrodinger potentials and, as the noise level vanishes, a recovery of the deterministic sub-Riemannian optimal transport problem. We demonstrate with a numerical example.
Positive vectors a, b, c, d satisfying the residual equations yield the upper bound that matches lower bounds from quadratic and Huber cases
abstractclick to expand
We prove, for every horizon N >= 3, the existence of the strengthened low-rank performance-estimation certificate proposed by Grimmer, Shu, and Wang for the Drori-Teboulle minimax nonnegative constant-stepsize problem for gradient descent. Let rho_N in (0,1) be the unique solution of rho_N^{2N}(2N rho_N+2N+1)=1. We show that the GSW certificate equations admit positive vectors a, b, c, d satisfying all residual equations. The proof proceeds through a reduced residual system in the variables d, a simplex existence argument for a positive reduced zero, a terminal residual completion identity, and a tail-square convolution argument proving the cumulative margins that force positivity of the certificate coefficients. Consequently, the GSW low-rank PEP certificate exists for every N >= 3 and yields the Drori-Teboulle upper bound. Together with the one-dimensional quadratic and Huber lower-bound examples, this establishes the Drori-Teboulle minimax nonnegative constant-stepsize value for gradient descent for every N >= 3.
This paper proposes a Byzantine-resilient consensus framework that simultaneously pursues two tightly coupled objectives: actively identifying Byzantine agents and guaranteeing resilient consensus among normal agents. Unlike existing methods that treat adversary mitigation as a passive filtering process, our approach embeds an active reputation learning mechanism into the consensus loop. Agents evaluate neighbors' behaviors using outlier-robust loss functions and historical information, and construct a reputation vector on a probability simplex via a mechanism that balances loss minimization with diversity-preserving exploration, representing dynamic beliefs over neighbor trustworthiness. These reputations are then used to form weighted local updates that suppress adversarial influence and improve agreement among normal agents, thereby reducing the bias in local loss evaluations and enabling more reliable subsequent reputation estimation. This learning-control co-design yields a closed-loop dual objective: improved consensus states enhance Byzantine identifiability, while refined reputations in turn improve consensus. A range of distributed systems experiments, benchmarking against classical resilient consensus methods, demonstrate superior Byzantine detection accuracy and significantly more reliable and scalable consensus.
Computing the unregularized Wasserstein barycenter for measure-valued data is a challenging optimization task. Recent algorithms have been tailored to either discrete measures as point clouds or continuous measures discretized on regular grids. In this work, we propose a primal mirror descent algorithm for computing the exact Wasserstein barycenter in the Fisher-Rao geometry. Our algorithm is a unified approach that is flexible enough to simultaneously cover discrete and absolutely continuous input measures, with convergence guarantees in both settings. In particular, when all input measures are discrete, our algorithm, initialized from any probability density, solves a sequence of semi-discrete optimal transport subproblems and produces absolutely continuous iterates that converge to the discrete barycenter. We use synthetic and real data examples to demonstrate the promising result in terms of accuracy and computational cost.
This paper develops a joint spectral radius (JSR) framework for analyzing rank-one deflated Q-value iteration (Q-VI) in discounted Markov decision process control. Focusing on an all-ones residual correction, we interpret the resulting algorithm through the geometry of switching systems and, to the best of our knowledge, give the first JSR-based convergence analysis of deflated Q-VI for policy optimization problems. Our analysis reveals that the standard Q-VI switching system model has JSR exactly the discount factor $\gamma\in (0,1)$, since all admissible subsystems share the all-ones vector as an invariant direction. By passing to the quotient space that removes this direction, we obtain a projected switching system model whose JSR governs the relevant error dynamics and may be strictly smaller than $\gamma$. Therefore, the deflated Q-VI admits a potentially sharper convergence-rate characterization than the ambient-space $\gamma$-bound. Finally, we prove that the correction is equivalent to a scalar recentering of standard Q-VI. Hence, the projected trajectory, and therefore the greedy-policy sequence, is unchanged relative to standard Q-VI initialized from the same point. The benefit of deflation is not a change in the induced decision-making problem, but a more precise JSR-based description of the convergence geometry after the redundant all-ones component is removed.
We propose an implicit neural formulation of optimal transport that eliminates adversarial min--max optimization and multi-network architectures commonly used in existing approaches. Our key idea is to parameterize a single potential in the Kantorovich dual and reformulate the associated c-transform as a proximal fixed-point problem. This yields a stable single-network framework in which dual feasibility is enforced exactly through proximal optimality conditions rather than adversarial training. Despite the inner fixed-point computation, gradients can be computed without differentiating through the fixed-point iterations, enabling efficient training without requiring implicit differentiation. We further establish convergence of stochastic gradient descent. The resulting framework is efficient, scalable, and broadly applicable: it simultaneously recovers forward and backward transport maps and naturally extends to class-conditional settings. Experiments on high-dimensional Gaussian benchmarks, physical datasets, and image translation tasks demonstrate strong transport accuracy together with improved training stability and favorable computational and memory efficiency.
Non-global minimizers are unstable in the mean-field limit when initial parameters have full support, covering attention layers and vector-s
abstractclick to expand
A surprising phenomenon in the training of neural networks is the ability of gradient descent to find global minimizers of the training loss despite its non-convexity. Following earlier works, we investigate this behavior for wide shallow networks. Existing results essentially cover the case of ReLU activations and the case of sigmoid activations with scalar output weights. We study a large class of models that includes multi-head attention layers and two-layer sigmoid networks with vector output weights. Building upon [Chizat and Bach, 2018], we prove that all non-global minimizers of the training loss are unstable under gradient descent dynamics. Thus, when the initial distribution of the parameters has full support (which includes the popular Gaussian case), and in the many hidden neurons or attention heads limit, continuous-time gradient descent can only converge to global minimizers. Establishing the instability of non-global minimizers corresponds to the construction of an ``escaping active set'' -- we complete the proof of [Chizat and Bach, 2018] to construct this set for models with bounded nonlinearities and scalar output weights. We also extend this construction to new cases for models with vector output weights. Finally, we show the well-posedness and the stability with respect to discretization of the mean field training dynamic for sub-Gaussian initializations.
Decentralized collision avoidance remains challenging, particularly when agents do not communicate any information related to planned trajectories. Most existing approaches either rely on conservative coordination mechanisms or provide limited guarantees on recursive feasibility and convergence. This paper develops a decentralized contingency MPC framework for multi-agent systems with nonlinear dynamics that achieves collision-free motion under a state-only information pattern. Each agent follows the same consensual rule set, enabling safe decentralized planning without communication. Each agent solves a local optimization problem that couples a nominal trajectory with a contingency certificate ensuring a feasible backup maneuver under receding-horizon operation. A novel geometric and decentralized safe-set update mechanism prevents feasibility loss between consecutive time steps. The resulting scheme guarantees recursive feasibility, including collision avoidance, and establishes a Lyapunov-type convergence result to an admissible safe equilibrium. Simulation results demonstrate performance in both sparse and dense multi-agent environments, including cluttered bottleneck scenarios and under plug-and-play operation.
The bound depends only on input bit length and is paired with an ellipsoid-based algorithm that minimizes the effective handicap via row and
abstractclick to expand
Linear Complementarity Problems (LCPs) with sufficient matrices form an important subclass of LCPs, and it remains a significant open question whether problems in this class can be solved in polynomial time. Kojima, Megiddo, Noma, and Yoshise gave an Interior Point Algorithm (IPA) in 1991, that can solve LCPs with sufficient matrices in time bounded polynomially in the input size and the so-called handicap number $\hat\kappa(M)$ of the coefficient matrix $M$. However, this value can be exponentially large in the bit encoding length. In fact, no upper bounds were previously known on $\hat\kappa(M)$. Settling an open question raised in de Klerk and E.-Nagy (Math Programming, 2011), we give an exponential upper bound on $\hat\kappa(M)$ in the bit-complexity of $M$. This is based on a new characterization of sufficient matrices. The new characterization also leads to a simple new proof of V\"aliaho's theorem on the equivalence of sufficient and $\mathcal{P}^*$-matrices (Linear Algebra and its Applications, 1996).
Noting that one can obtain an equivalent LCP by rescaling the rows and columns by a positive diagonal matrix, we define $\hat\kappa^\star(M)$ as the best possible handicap number achievable under such rescalings. Our second main result is an algorithm for LCPs with sufficient matrices, where the running time is polynomially bounded in the input size and in the optimized value $\hat\kappa^\star(M)$. This algorithm is based on the observation that the set of near-optimal row-rescalings forms a convex set. Our algorithm combines the Ellipsoid Method over the set of row rescalings, and an IPA with running time dependent on the handicap number of the matrix. If the IPA fails to solve the LCP in the desired running time, it provides a separation oracle to the Ellipsoid Method to find a better rescaling.
The incomplete round robin sports tournament format, where each team plays the same number of games but faces only a subset of the other teams, is becoming increasingly popular in both youth and professional competitions. In contrast to conventional round robin tournaments, however, neighborhood structures for scheduling incomplete round robin tournaments have largely remained unexplored. We fill this gap by proposing two novel neighborhood structures and describe them in graph theory terms. One of them introduces a single new game followed by a minimal repair chain, while the other introduces possibly many new games but only affects a single round. The latter is shown to fully connect the solution space. We embed the neighborhoods in an adaptive late acceptance hill climbing algorithm and show that the proposed algorithm obtains high quality and new best solutions for several sets of instances from the literature, thereby empirically confirming the effectiveness of the proposed neighborhoods.
The Frank-Wolfe algorithm achieves a convergence rate of $\mathcal{O}(1/T)$ for smooth convex optimization over compact convex domains, accelerating to $\mathcal{O}(1/T^2)$ when both the objective and the feasible set are strongly convex. This acceleration extends beyond strong convexity: Kerdreux et al. (2021a) proved rates of $\mathcal{O}(T^{-p/(p-1)})$ over $p$-uniformly convex feasible sets, a class that interpolates between strongly convex sets and more general curved domains such as $\ell_p$ balls. In this work, we establish a matching $\Omega(T^{-p/(p-1)})$ lower bound for every $p\ge 3$ under exact line search or short steps, and extend the lower bound to objectives satisfying a H\"olderian error bound. The proofs analyze the dynamics of Frank-Wolfe iterates on simple instances and hence are not limited to the high-dimensional setting, unlike information-theoretic lower bounds.
We propose a Riemannian limited-memory BFGS method for optimization problems with Euclidean bounds. The method combines a limited-memory quasi-Newton update in the tangent space with a Riemannian adaptation of the generalized Cauchy point strategy from classical L-BFGS-B, enabling efficient handling of Euclidean bounds while exploiting the geometric structure of the optimization domain. This setting is important in several applications, including covariance matrix estimation with bounded variance, neuroimaging, EEG signal classification, and other signal processing or computer-vision tasks that couple manifold variables with constrained Euclidean parameters.
We provide a generic algorithmic framework and an implementation of the algorithm in the Manopt.jl library. Numerical experiments on benchmark problems indicate only minor reduction in performance on Euclidean problems compared to the classical L-BFGS-B method, while outperforming interior-point methods. Furthermore, the algorithm was tested on two mixed manifold and bounded Euclidean problems: amplitude-limited blind source separation with Gaussianity penalization and bounded-variance maximum likelihood common principal components analysis. The proposed method outperforms existing methods by several orders of magnitude.
Leader randomizes protections by relaxing the attacker's problem and shifting to vertex distributions, yielding a polynomial-time guarantee.
abstractclick to expand
We study a new bilevel optimization problem, termed the Randomized Max-Vertex-Cover Interdiction (RMVCI) problem under matroid constraints, which can be modeled as a zero-sum Stackelberg game on a network between a leader and a follower. The leader randomly selects a subset of vertices to protect, subject to a matroid constraint, while the follower-after inferring the leader's protection probability distribution-chooses a subset of vertices (also matroid-constrained) to attack, aiming to maximize the expected total weight of edges incident to the set of vertices that are both attacked and unprotected. The leader's objective is to determine an optimal randomized interdiction strategy that minimizes the follower's expected payoff.
Since the follower's response problem is NP-hard, the resulting bilevel program is computationally challenging. We develop a conceptual approximation framework for tackling general bilevel interdiction problems. For the RMVCI problem under matroid constraints, we first formulate the follower's problem as an integer linear program and show that its linear relaxation admits a tight integrality gap of $\tfrac{4}{3}$. Within the approximation framework, we replace the follower's problem by its LP relaxation, and then study the resulting bilevel program. By shifting from distributions over sets to distributions over vertices and applying our approximation framework, we manage to design a polynomial-time 2-approximation algorithm for this relaxed bilevel problem. Combining these ingredients within our framework yields a polynomial-time $\tfrac{8}{3}$-approximation algorithm for RMVCI under matroid constraints.
We develop a backstepping-based observer design for a class of ODE - continuum-PDE cascade systems, which can be viewed as the limit, of a finite collection of ODE - $2 \times 2$ hyperbolic systems, as the number of individual PDE system components tends to infinity. The large-scale collection of ODE - $2 \times 2$ hyperbolic systems is motivated by a dynamic model that we present, of a network of peripheral arteries, to which central (aortic) blood flow/pressure enters. We address a case in which average (boundary) measurements, over the ensemble dimension, are available, which is motivated by the availability of non-invasive, peripheral flow/pressure measurements. Exponential stability of the estimation error system is shown by proving well-posedness of the kernel equations and constructing a Lyapunov functional. We also establish that part of the backstepping kernels derived coincide with the solution of a Sylvester equation. We then apply the continuum-based observer for state estimation of the large-scale counterpart and, in particular, of the blood flow system, introducing an approach for optimal construction of continuum approximations. We also introduce an implementation method, adopting a spectral-based approach for computing the observer dynamics, which we illustrate in an academic, numerical simulation example. Furthermore, we illustrate the design in the problem of central flow/pressure estimation using realistic parameters and flow/pressure waveforms.
This paper presents a PAC-Bayes framework for learning controllers for unknown stochastic linear discrete-time systems, where the system parameters are drawn from a fixed but unknown distribution. We derive a data-dependent high probability bound on the performance of any learned (stochastic) controller, and propose novel efficient learning algorithms with theoretical guarantees, which can be implemented for both finite and infinite controller spaces. Compared to prior work, our bound holds for unbounded quadratic cost. In the special case where LQG is optimal, our numerical results suggest that the learned controllers achieve comparable performance to LQG.
We study the parameterized complexity of testing approximate first-order stationarity at a prescribed point for continuous piecewise-affine (PA) functions, a basic task in nonsmooth optimization. PA functions form a canonical model for nonsmooth stationarity testing and capture the local polyhedral geometry that appears in ReLU-type training losses. Recent work by Tian and So (SODA 2025) shows that testing approximate stationarity notions for PA functions is computationally intractable in the worst case, and identifies fixed-dimensional tractability as an open direction.
We address this direction from the viewpoint of parameterized complexity, with the ambient dimension $d$ as the parameter. In this paper, we give XP algorithms in fixed dimension for the tractable sides, and prove W[1]-hardness for the complementary sides. Moreover, lower bounds under the Exponential Time Hypothesis rule out algorithms running in time $\rho(d)\size^{o(d)}$ for any computable function $\rho$, where $\size$ denotes the total binary encoding length of the stationarity-testing instance. As a further consequence, our results yield the corresponding parameterized complexity picture for testing local minimality of continuous PA functions. We further extend our hardness results to a family of shallow ReLU CNN training losses, with stationarity tested in the trainable weight space. Thus, the same parameterized-complexity picture also appears for simple CNN training losses.
A mapping of PDE inputs to the ODE and an estimator-observer scheme achieve exponential stability and boundedness for the closed-loop system
abstractclick to expand
In this article, we investigate the problem of exponential stabilization via output feedback for a cascaded system composed of an ordinary differential equation (ODE) and a wave partial differential equation (PDE) under boundary control. Four types of boundary interconnections are considered. In the absence of disturbances, a novel transformation is introduced to incorporate the PDE boundary control input into the ODE subsystem. Based on this transformation, a state feedback controller is designed to achieve exponential stability for both the ODE and PDE components. When internal uncertainties and external disturbances that match the control structure are present, a disturbance estimator is constructed. Utilizing this estimator, a Luenberger-type state observer is developed to reject the disturbances and exponentially stabilize the original system via an observer-based control scheme. Furthermore, the boundedness of the entire closed-loop system is rigorously established. Numerical simulations are provided to illustrate the theoretical results.
Recently, Muon and related spectral optimizers have demonstrated strong empirical performance as scalable stochastic methods, often outperforming Adam. Yet their behaviour remains poorly understood. We analyze stochastic spectral optimizers, including Muon, on a high-dimensional matrix-valued least squares problem. We derive explicit deterministic dynamics that provide a tractable framework for studying learning behaviour with a focus on (stochastic) SignSVD, which Muon approximates, and (stochastic) SignSGD, the latter serving as a proxy for Adam. Our analysis shows that for large batch size, SignSVD performs a square-root preconditioning with respect to the data covariance spectrum, while for small batch size smaller eigenmodes behave like SGD, slowing down convergence. We contrast with SignSGD which for generic covariance performs no preconditioning and has no transition, leading to different optimal learning rates and convergence characteristics. The two methods match up to a constant factor with isotropic data, but behave differently with anisotropic data. An analysis of a power law covariance model with data exponent $\alpha$ and target exponent $\beta$ shows there are three phases in the $(\alpha,\beta)$ plane: one where SignSGD is uniformly favored, one where SignSVD is uniformly favored, and a third where the two methods exhibit a trade-off in performance.
Classical persistent excitation criteria usually assess whether an input or regressor signal is sufficiently rich. In Koopman and EDMD with control (EDMDc), however, data quality is determined by the concatenation of lifted state features and control inputs. Input-rich data can still visit a narrow state region, well-spread state samples can still produce degenerate lifted features, and both can fail to condition the final regression problem.
This paper develops a diagnostic certificate framework for locating these failures. The certificates separate state-space coverage and clustering, lifted-feature nondegeneracy, and the final regression spectrum. The regression-spectrum certificate is the layer with direct theoretical guarantees: it controls the active standardized design's smallest singular value, has Fisher-information and one-step EDMDc stability interpretations, and admits a finite-sample lower bound under a population spectral gap. We also give structural examples and a Schur-complement condition showing why state, lifted, input, and regression diagnostics cannot be substituted for one another.
As a sampling example, IGPE-DOPT uses these certificates to score candidate trajectory segments. Experiments on Duffing, Van der Pol, and Lorenz systems compare input-, state-, lifted-, and regression-oriented baselines. The results show that certificate layers separate, budget and weights shift bottlenecks, and downstream prediction or control performance is not monotone in any single certificate. The framework is therefore intended as an interpretable diagnostic and data-collection guide, not as a universal optimality claim.
Decay of the uncontrolled system enables a smooth transition that reaches exact zero state at finite time T.
abstractclick to expand
This paper addresses the controllability of a class of quasi-linear parabolic equations governed by multiplicative controls with mobile support. To prove the existence of such a control forcing the solution to rest at time $T>0$, we first establish the decay property of solutions for the uncontrolled system. Unlike the case of the linear heat equation, the nonlinearity in the principal part of the operator introduces significant challenges. These difficulties necessitate a novel approach, ultimately leading us to solve the controllability problem within the framework of classical solutions. Through a carefully constructed smooth transition, we demonstrate that there exists a multiplicative control driving the state exactly to rest at time $t=T$.
This paper studies satisfying temporal logic specifications on stochastic dynamical systems, where the predicates evolve randomly over time. Such randomness may arise from uncertain environment models or external stochastic processes causing the sets associated with predicate satisfaction to vary in a non-deterministic manner. As a result, verifying whether a stochastic dynamical system satisfies a temporal specification depends also on the uncertainty in the predicates. We develop a certificate-based framework to bound the probability of satisfying temporal logic specifications with randomly evolving predicates. We first show that temporal logic specifications with stochastic predicates can be transformed to specifications with deterministic predicates on an augmented space which is extended to include the stochastic space of predicate's uncertainty. We then utilize barrier certificates on an augmented space to provide tractable optimization-based conditions and to avoid the computational burden of dynamic programming. Focusing on linear dynamics and safety-type specifications, we derive analytical conditions under which barrier certificates guarantee bounds on the probability of violating the stochastic safety predicates. The approach is demonstrated on numerical case studies.
Extended domination-monotonicity conditions solve linear-convex and linear-quadratic problems with time-dependent random input constraints.
abstractclick to expand
This paper extends the domination-monotonicity conditions, which guarantee the well-posedness of extended mean-filed forward-backward stochastic differential equations (extended MF-FBSDEs), from the previously studied linear framework to a nonlinear setting by incorporating nonlinear adjoint functions. Utilizing this generalized well-posedness result for extended MF-FBSDEs in conjunction with other refined analytical techniques, we address two classes of stochastic quadruple optimal controlled problems: a linear-convex problem and a linear-quadratic problem with input constraints that are permitted to be time-dependent and random. For each problem, we establish the existence and uniqueness of optimal controls and derive their explicit closed-form representations.
A nonlinear kernel-free soft quadratic surface support vector machine model with 0-1 loss function ($L_{0/1}$-SQSSVM) is proposed for binary classification problems, which is non-convex discontinuous. We are devoted to establishing the first and the second-order optimality conditions for the $L_{0/1}$-SQSSVM. We establish a stationary equation using the properties of proximal operator of 0-1 loss function. We design a Newton method based on the stationary equation to solve $L_{0/1}$-SQSSVM model and prove that the Newton method has local quadratic convergence under the second-order sufficient condition. Numerical experience on artificial datasets and benchmark datasets demonstrate that the Newton method for $L_{0/1}$-SQSSVM achieves higher classification accuracy with less CPU time cost than other state-of-the-art methods.
We consider a mutual information (MI) regularized version of optimal density control of a discrete-time linear system. MI optimal control has been proposed as an extension of maximum entropy optimal control to trade off between control performance and benefits provided by stochastic inputs. MI regularization induces stochasticity in the policy, which poses challenges for applications of MI optimal control in safety-critical scenarios. To remedy this situation, we impose Gaussian density constraints at specified times to directly control state uncertainty. For this MI optimal density control problem, we propose an alternating optimization algorithm and derive the closed form of each step in the algorithm. In addition, we reveal that the alternating optimization of the MI optimal density control problem coincides with that of the so-called generalized Schr\"{o}dinger bridge problem associated with the discrete-time linear system.
Two separate limits recover classical local optimal control from the nonlocal setting for both convex p-Laplacian and polyquasiconvex cases.
abstractclick to expand
In this paper we consider optimal control problems in the nonlocal function space framework of Bellido-2023, where there are two different parameters: a horizon parameter $\delta > 0$; and a fractional parameter $s \in (0, 1)$. The constraints are given in the form of minimizing an energy density, and we will focus on two particular cases: the well-posed case where the underlying energy density is convex and is given by the nonlocal $p$-Laplacian; and a more general poly/quasiconvex energy for which minimizers exist but may not be unique. The study is concluded by analyzing the approximation to local problems in two parallel ways, either taking the fractional parameter $s$ to $1$ or the horizon parameter $\delta$ to $0$.
We study optimistic bilevel optimization when the lower-level problem has a non-isolated manifold of minimizers. In this setting, the hyper-objective may be non-differentiable because the upper-level criterion must choose among multiple lower-level solutions. Under a local Polyak--{\L}ojasiewicz (P{\L}) condition, we show that differentiability does not require the lower-level solution set to be a singleton: uniqueness of the optimistic selection is sufficient. This yields an explicit pseudoinverse-based hyper-gradient formula extending the classical singleton-minimizer result. We further characterize the regularity of the hyper-objective: non-degeneracy of the selected minimizer along the solution manifold yields local smoothness, while failure of uniqueness can create many non-differentiable points and failure of non-degeneracy can destroy all positive H\"older regularity of the hyper-gradient. Motivated by this theory, we propose HG-MS, a select-then-differentiate method combining explicit optimistic selection with efficient pseudoinverse-based hyper-gradient computation. Despite the nonconvex nature of optimistic selection over the lower-level solution manifold, we show that HG-MS converges to a stationary point of the optimistic objective with complexity governed by the intrinsic dimension of the solution manifold rather than its ambient dimension. Empirically, we test a practical variant of HG-MS for matched-budget LLM source reweighting. This variant preserves the select-then-differentiate principle and obtains the best GSM8K/MATH scores across the tested backbones, along with competitive or best MT-Bench instruction-following results.
Existence of an L2-cost minimizer is shown and first-order conditions are obtained from the adjoint system for two immiscible incompressible
abstractclick to expand
This article discusses an optimal control problem for a phase field model of two immiscible incompressible fluid flow, incorporating surface tension effects. The optimal control problem is defined with a $L^2$-cost functional and subject to the constraints governed by a system of coupled Stokes-Cahn-Hilliard-Oono equations. In this model, fluids are separated by a dynamic diffuse interface of finite width. We investigate the optimality condition of a given control. Initially, we establish the existence of an optimal solution for the coupled optimal control problem. Subsequently, we derive the optimality condition with respect to the corresponding adjoint system.
Simple algorithm certifies lack of lower bound in any polynomial optimization problem and reveals its asymptotic geometry.
abstractclick to expand
Global polynomial optimization methods typically rely on compactness of the feasible region in order to find solutions. These methods can incur considerable computational expense and most commercially available solvers do not verify the existence of a solution prior to undergoing global search. In this manuscript we propose a simple pre-processing algorithm to determine if an arbitrary polynomial optimization problem is unbounded from below thereby providing information about the problem's asymptotic geometry prior to solving the problem if a solution can be found.
A sequence of uniformly hyperbolic equations approximates the interior-degenerate system, enabling the first controllability results in more
abstractclick to expand
In this paper, we establish the existence of solutions for a particular class of degenerate hyperbolic equations. Following this, we approximate these degenerate equations by employing a sequence of uniformly hyperbolic equations. Notably, this specific approximation result has remained unexplored in the existing body of literature. Ultimately, we utilize this approximation framework to derive controllability results for the original degenerate hyperbolic equations, marking what could potentially be the inaugural investigation into higher-dimensional degenerate hyperbolic equations.
This paper presents an H2-optimal model order reduction (MOR) method for linear systems with quadratic outputs based on Riemannian optimization. The H2-optimal MOR is formulated as an optimization problem in which the optimization variables are selected directly as the coefficient matrices of reduced models. The product manifold is defined properly to impose the stability condition for reduced models. By exploiting the geometric properties of the product manifold, we derive an explicit formula for Riemannian gradient of the objective function, and then a limited-memory Riemannian BFGS method is adopted to solve the resulting optimization problem iteratively. In contrast to selecting projection matrices, optimizing coefficient matrices of reduced models reduces the amount of variables dramatically. Numerical simulation results demonstrate that reduced models accurately approximate the original system and exhibit superior performance in terms of H2 error, which confirms the effectiveness of the proposed algorithm.
We propose a generative framework for learning stochastic dynamics from endpoint and intermediate distributional observations. The method formulates generation as a McKean-Vlasov control problem in which terminal and time-marginal laws are enforced through soft energy constraints. The associated optimality system is a forward-backward stochastic differential equation (FBSDE) whose backward component receives a continuous drift induced by the marginal law penalties. This provides a principled alternative to hard interpolation or optimal transport maps between observed distributions: the model learns a stochastic path law whose dynamics remain globally coupled through the mean-field objective. We derive the reduced FBSDE system for quadratic control cost and constant diffusion, connecting terminal and marginal law flat derivatives to score-like training signals. The resulting neural solver is evaluated on low-dimensional distributional benchmarks, where it recovers smooth stochastic paths matching prescribed marginal laws. In a higher-dimensional ALAE latent space, endpoint supervision is used as a qualitative stress test for transporting non-smiling faces toward smiling ones in a pretrained representation. We then use articulated human motion as a structured high-dimensional case study on a curated AMASS low-to-high position dataset, using SMPL-H pose sequences and reduced pose representations. The experiments show that soft marginal law constraints can produce coherent stochastic trajectories whose intermediate distributions follow the observed evolution of human motion. The code is available at https://github.com/murex/deep-mkv-gen/tree/main.
This paper presents a computationally efficient approach for Gaussian process model predictive control (GP-MPC), where Gaussian process (GP) regression is used to complement a baseline model of the system dynamics. The proposed method achieves propagation of both the predicted mean and variance, thereby significantly reducing conservativeness compared with existing GP-MPC formulations. The nonlinear GP-MPC problem is reformulated into an exact linear parameter-varying (LPV) structure that preserves the nonlinear prediction dynamics in affine form without introducing further approximation. Moreover, closed-form derivations of moment matching (MM) predictions for sparse GPs are developed, including both mean and variance propagation under uncertain inputs, which improves scalability to larger datasets. This further enables recasting the resulting GP-MPC problem as a sequence of quadratic programs (QPs), which can be solved efficiently. The proposed framework significantly improves runtime efficiency while maintaining prediction accuracy, as demonstrated through simulation and real-world experiments on a Crazyflie 2.1 micro quadcopter.
Large-scale machine learning models are trained on clusters of machines that exhibit heterogeneous performance due to hardware variability, network delays, and system-level instabilities. In such environments, time complexity rather than iteration complexity becomes the relevant performance metric for optimization algorithms. Recent work by Tyurin and Richt\'{a}rik (2023) established the first time complexity analysis for parallel first-order stochastic optimization, proposing Rennala SGD as a time-optimal method for smooth nonconvex optimization. However, Rennala SGD is fundamentally a modification of SGD, and variance reduction techniques are known to improve the iteration complexity of SGD. In this work, we investigate whether variance reduction can also improve time complexity in heterogeneous systems. We show that, under a mean-squared smoothness assumption, variance reduction can improve time complexity in relevant parameter regimes. To this end, we propose Rennala MVR, a variance-reduced extension of Rennala SGD based on momentum-based variance reduction, and analyze its oracle and time complexity. We establish lower bounds for time complexity under these assumptions. On a stochastic quadratic benchmark, experiments with the exact method support the theory, while neural-network experiments with a practical inexact variant show similar empirical gains over Rennala SGD.
We design Local LMO - a new projection-free gradient-type method for constrained optimization. The key algorithmic idea is to replace the global linear minimization oracle over the constraint set used by Frank-Wolfe (FW) with a local linear minimization oracle over the intersection of the constraint set and a "small" ball centered at the current iterate. In particular, when minimizing $f:\mathbb{R}^d\to \mathbb{R}$ over a constraint $\emptyset\neq\mathcal{X}\subseteq\mathbb{R}^d$, Local LMO performs the iteration \[x_{k+1}\in \arg\min_{z\in\mathcal{X}\cap\mathcal{B}(x_{k},t_k)}\langle\nabla f(x_{k}), z \rangle,\] where $x_0\in\mathcal{X}$, and $t_k>0$ is a suitably chosen radius which can be interpreted as an effective stepsize. While designed as an alternative to FW, Local LMO is perhaps best viewed as a generalization of Gradient Descent (GD) rather than a modification of FW. Indeed, it is easy to see that Local LMO reduces to GD in the unconstrained setting and, more generally, to GD restricted to an affine subspace if the constraint $\mathcal{X}$ is affine. We prove that this simple algorithmic scheme transfers the known (unaccelerated) convergence rates of Projected Gradient Descent (PGD) to the projection-free world in several important regimes, some of which are beyond the reach of FW. In contrast to FW theory, i) our guarantees hold without requiring the feasible set $\mathcal{X}$ to be bounded, ii) our theory does not require the "curvature" assumption, which allows us to establish a standard sublinear rate for convex functions with bounded gradients, iii) we obtain a linear rate in the smooth strongly convex regime. Furthermore, we obtain sharp sublinear rates in the smooth convex and non-convex regimes, in the $(L_0,L_1)$-smooth convex regime, and in stochastic and non-differentiable settings.
In this paper, we propose several set-point control schemes for achieving finite-time regulation in a class of Euler--Lagrange systems with $n$ degrees of freedom and uncertain potential energy. The proposed controllers are based on composite adaptive control approaches. Each control scheme consists of two main components: a Proportional--Derivative (PD)-based nonlinear feedback term and a finite-time parameter estimation law. The estimation laws rely on the Dynamic Regressor Extension and Mixing (DREM) technique, which can be designed using either the Kreisselmeier or the least-squares dynamic regressor extensions. These results extend recent advances in finite-time adaptive control for Euler-Lagrange systems. To the best of the authors' knowledge, the composite adaptive control formulation proposed here, which does not employ well-known Slotine and Li adaptive control structure, has not been studied in detail yet. The properties of the proposed controllers are rigorously analyzed using Lyapunov methods. The performance of the controllers is thoroughly evaluated through extensive simulation studies.
The resulting dual problem has one-to-one local solution matches and global matches when parameters are chosen suitably.
abstractclick to expand
Simple cardinality refers to counting nonzero elements of an independent variable satisfying certain properties. Composite cardinality is a simple counting process composited with an affine mapping, and is therefore more complicated than the simple cardinality. We study the composite cardinality optimization problem (CCOP) with structures covering a wide range of applications. Through the use of the stationary duality, we reduce the composite counting to simple counting, and thereby obtain a dual formulation of CCOP. For both primal and dual problems, we investigate the sufficient conditions for the existence of global solutions. Those conditions are validated on representative examples from existing literature. We then show that local solutions of the primal and dual problems are equivalent to their stationary points. This result further helps us establish a one-to-one correspondences between primal and dual local solutions. We also demonstrate that the correspondence holds for a pair of global solutions to the primal and dual problems, provided that the dual weighted parameters are appropriately selected. The reported theoretical results lay foundation for developing numerical algorithms for CCOP in future.
FDR improves constants from prior work and matches a new lower bound for convex-plus-strongly-convex sums.
abstractclick to expand
When minimizing the sum of a convex and a strongly convex function, or when finding the zero of the sum of a monotone operator and a strongly monotone operator, Chambolle and Pock (2010) and Davis and Yin (2015) proposed accelerated mechanisms that achieve an $\mathcal{O}(1/N^2)$ convergence rate for the squared distance to the solution, but the optimality of this rate was not established. In this work, we present Fast Douglas--Rachford Splitting (FDR), an accelerated method that improves the constants established in the prior works, and provide a complexity lower bound establishing that both the $\mathcal{O}(1/N^2)$ convergence rate and the leading-order constant of FDR's rate are optimal.
Replacing the Hessian with the closed convex hull of limiting Hessians produces an interval index that reduces to the smooth case and is nil
abstractclick to expand
Davydov, Moldavskaya, and Zitikis introduced local indices for quantifying the lack of convexity of a \(C^2\) function by measuring the nuclear-norm distance of its Hessian from the cone of positive semidefinite matrices. This paper develops a local analogue for functions of class \(C^{1,1}\). At a point \(x\), the classical Hessian is replaced by the Clarke-type generalized Hessian set \(\Hess(h;x)\), defined as the closed convex hull of limiting Hessians at nearby twice differentiability points. Evaluating the same spectral functional over \(\Hess(h;x)\) gives an interval-valued local nonconvexity index whose lower and upper endpoints represent, respectively, the least and greatest visible second-order nonconvexity at \(x\). The construction reduces to the original smooth index when \(h\in C^2\), vanishes for convex \(C^{1,1}\) functions, is invariant under orthogonal changes of variables, satisfies a subadditivity inequality for the upper endpoint under sums, and is upper semicontinuous in its upper endpoint. We also relate the upper endpoint to a pointwise weak-convexity curvature modulus and give explicit \(C^{1,1}\setminus C^2\) examples. The paper is deliberately local in scope: it proposes a scalar diagnostic extracted from generalized Hessian sets, not a replacement for the richer second-order variational theory of nonsmooth convexity.
Lyapunov-IQC modeling turns the problem into an LMI feasibility check solvable by convex optimization.
abstractclick to expand
We develop a unified Lyapunov-integral quadratic constraint (IQC) framework for establishing uniform stability of first-order accelerated optimization algorithms in the $\beta$-smooth and $\gamma$-strongly convex regime. Classical analyses of uniform stability, such as the work of Hardt, Recht, and Singer for stochastic gradient descent (SGD), rely on direct coupling arguments and case-by-case control of iterate differences under random sampling. Extending such arguments to accelerated methods, such as Nesterov Accelerated Gradient (NAG), is complicated by the presence of higher-order state dynamics induced by momentum. We first extend this classical approach with the use of Lyapunov functions to provide a uniform stability bound for smooth quadratic NAG, and supplement this result with small-scale numerical experiments. We then extend this framework by modeling first-order accelerated optimizers as Lur'e-type feedback interconnections between a linear dynamical system and a (non-linear) gradient operator. $\beta$-Smoothness and $\gamma$-strong convexity are encoded a sector IQC inequality. Under this representation, uniform stability is certified via the existence of a quadratic Lyapunov function satisfying a finite-dimensional linear matrix inequality (LMI) in the form of a feasibility problem, which can be solved via semi-definite programming (SDP). We instantiate this framework for NAG and show how classical uniform stability bounds can be recovered via this framework. These results underscore a structural connection between optimization dynamics and robust control theory, providing a modular methodology for reliable and reproducible numerical certification of uniform stability and generalization behavior of first-order methods via convex optimization tools that is adaptable to increasingly complex optimization algorithms.
Robust learning aims to maintain model performance under noise, corruption, and distributional shifts, which are prevalent in modern machine learning applications. This work shows that examples of robust learning problems can be formulated as (strongly) quasar-convex optimization problems, which admit a benign landscape with no saddle points. We then propose HiPPA, an inexact high-order proximal-point method that employs a model-value gap to control the inexactness of subproblem solutions. Notably, we prove global convergence of HiPPA to global minima and establish that it attains a (local) linear or superlinear convergence rate, depending on the regularization order and inexactness control. Our numerical experiments on robust feature-alignment distillation indicate strong empirical performance of HiPPA and results consistent with our theoretical findings.
Self-optimizing control (SOC) aims to maintain near-optimal process operation by judiciously selecting controlled variables (CVs). In this series of work, the generalized global SOC (g2SOC) approach is proposed, which extends the concept of SOC to the whole operation space and uses general nonlinear functions to design CVs instead of linear combinations. In the first part of this series work, two numerical approaches for g2SOC are proposed: the optimization-based approach and the regression-based approach, based on a theoretical analysis of the existence of perfect self-optimizing CVs. The CVs designed by the former perform better, but are usually infeasible for large-scale problems. In this paper, we propose an algorithm called objective-guided controlled variable learning (OGCVL) that combines the advantages of both and has a better scalability. OGCVL is proposed for efficient CV design that seamlessly integrates symbolic and numerical computation techniques. Finally, the effectiveness of the OGCVL method is verified in two numerical examples. Both examples illustrate show that the OGCVL method is able to achieve good results while maintaining computational efficiency and is also feasible in large-scale problems.
Solving optimal control problems via large-scale NLP solvers depends on discretizing continuous dynamics. Yet, this transcription step hides critical vulnerabilities-most notably truncation error and invariant drift-that can drive solvers toward dynamically infeasible or suboptimal trajectories. To expose these hidden failures, we introduce a problem- and transcription-agnostic adversarial objective that leverages the structure of local truncation-error bounds to aggressively amplify such defects. When applied to a 6-DOF rocket-landing problem, we reveal a stark reliability gap: of fourteen transcription methods tested, only three satisfy rigorous validation criteria. These results also expose a striking performance inversion: even in the absence of classical stiffness, a fourth-order implicit scheme (GL2) matches the fidelity of a sixth-order explicit method (RK6). Using B-series expansions and symplectic Runge-Kutta theorems, we isolate the specific truncation errors and quaternion-invariant drift responsible for these failures. Crucially, these theoretical vulnerabilities dictate operational performance: in practical lateral-divert scenarios, the implicit GL2 consistently outperforms the explicit RK6 in both end-to-end solve speed and robustness.
A single-sequence Mangasarian-Fromovitz type check upgrades it to directional stationarity under nonsmooth geometric constraints.
abstractclick to expand
Approximate stationarity conditions provide necessary optimality conditions without requiring additional assumptions by demanding that a perturbed stationarity system possesses solutions as the involved perturbations tend to zero. Together with associated approximate constraint qualifications, which are typically rather mild, they raised much interest in the optimization community during the last decade. In parallel, directional stationarity conditions became quite popular as they sharpen standard stationarity conditions by incorporating data associated with underlying critical directions. The purpose of this paper is twofold. First, we melt the aforementioned concepts of approximate and directional stationarity to formulate and study so-called approximate directional stationarity. For the underlying model problem, an optimization problem with nonsmooth geometric constraints is chosen, which covers diverse practically relevant applications. The role of approximate directional stationarity as a necessary optimality condition is investigated in much detail, complementing results from the literature. Second, we formulate a qualification condition which, based on an approximately directionally stationary point, can be exploited to infer its directional stationarity. The latter condition depends on one particular sequence verifying approximate directional stationarity and merely requires to check a simple condition of Mangasarian--Fromovitz type stated in terms of the directional tools of limiting variational analysis. This contrasts standard approximate constraint qualifications that typically demand a certain stable behavior of all sequences validating approximate stationarity. Throughout, various approaches to verify directional stationarity of local minimizers are established, and illustrative examples are presented to make the theoretical results more accessible.
We study a class of bilevel optimization problems in which both the upper- and lower-level problems have minimax structures. This setting captures a broad range of emerging applications. Despite the extensive literature on bilevel optimization and minimax optimization separately, existing methods mainly focus on bilevel optimization with lower-level minimization problems, often under strong convexity assumptions, and are not directly applicable to the minimax lower-level setting considered here. To address this gap, we develop penalty-based first-order methods for bilevel minimax optimization without requiring strong convexity of the lower-level problem. In the deterministic setting, we establish that the proposed method finds an $\epsilon$-KKT point with $\tilde{O}(\epsilon^{-4})$ oracle complexity. We further show that bilevel problems with convex constrained lower-level minimization can be reformulated as special cases of our framework via Lagrangian duality, leading to an $\tilde{O}(\epsilon^{-4})$ complexity bound that improves upon the existing $\tilde{O}(\epsilon^{-7})$ result. Finally, we extend our approach to the stochastic setting, where only stochastic gradient oracles are available, and prove that the proposed stochastic method finds a nearly $\epsilon$-KKT point with $\tilde{O}(\epsilon^{-9})$ oracle complexity.
In power systems, the risk of wildfire ignition has increased significantly in recent years. The impact and severity of these events on energy dispatch, as well as their societal ramifications, make wildfire prevention critical for power system planning and operation. A common intervention by system operators is to de-energize transmission lines to mitigate the risk of fire caused by equipment failures. With the growing integration of variable renewable generation, managing and preparing the system to de-energization under wildfire risk has become even more challenging. In this context, mitigation decisions such as installing battery energy storage systems and undergrounding transmission lines can reduce the risk and adverse effects associated with de-energization and renewable generation variability. This paper presents a robust optimization model to determine the optimal location of battery storage and undergrounding of transmission line investment, utilizing representative weeks and uncertainty sets to capture the temporal relationship of uncertain variables. Specifically, this paper addresses: (i) the worst-case realization of ignition risk leading to the de-energization of transmission lines, combined with the worst-case realization of renewable energy availability, and (ii) the optimal investment decisions for energy storage capacity and undergrounding of transmission lines that are exposed to ignition risk. The proposed model is formulated as a mixed-integer linear programming (MILP) problem, employing duality theory and binary decomposition to address nonlinearities, and is solved using a column-and-constraint generation algorithm. The proposed framework is evaluated on a model of the San Diego power system, demonstrating its practical effectiveness in improving the resilience to wildfire risk.
In centralized, distributed, and federated learning with stochastic gradients and $n$ workers, it was recently shown that it is infeasible to find an $\varepsilon$-stationary point faster than $\tilde{\Omega}\left(\min\left\{\frac{d \kappa L \Delta}{\varepsilon} + \frac{h L \Delta}{\varepsilon} + \frac{h \sigma^2 L \Delta}{n \varepsilon^2},\; \frac{h \sigma^2 L \Delta}{\varepsilon^2} + \frac{h L \Delta}{\varepsilon}\right\}\right)$ seconds in both homogeneous and heterogeneous settings under standard assumptions: $L$-smoothness, $\sigma^2$-bounded unbiased stochastic gradients, and lower boundedness of the function, i.e., $f(x) \geq f^*$ for all $x \in \mathbb{R}^d$, where $\Delta = f(x^0) - f^*$, $h$ is the computation time, $\kappa$ is the communication speed between the workers and the server, and $d$ is the dimension of the iterates and gradients. This result is pessimistic since it does not allow a complexity in which both $\frac{d \kappa L \Delta}{\varepsilon}$ and $\frac{h \sigma^2 L \Delta}{\varepsilon^2}$ improve with $n$, even when using random sparsification techniques; moreover, this lower bound can be matched by either non-distributed SGD or vanilla Synchronous SGD, which reduces the impact of recent progress in the design of compression-based methods. In this work, we challenge this limitation and propose new compressed methods, Inkheart SGD and M4, and show that under an additional structural assumption, which is necessary due to the lower bound and which does not restrict the class of considered problems, we achieve new state-of-the-art time complexities that break this pessimistic barrier and allow scaling with the number of workers $n$.
We consider a first order stochastic optimization framework where, at each iteration, $K$ independent identically distributed (i.i.d.) data point samples are drawn, based on which stochastic gradients can be queried. We allow gradient noise to be heavy-tailed, with possibly infinite variances. For the considered heavy-tailed setting, many algorithmic variants have recently been proposed based on gradient clipping or other nonlinear operators (e.g., normalization) applied over noisy gradients. In this paper, we take an alternative approach and propose a novel stochastic first order method dubbed Robust Stochastic Gradient Descent with medoid mini-batch gradient sampling, R-SGD-Mini for short. The core idea of R-SGD-Mini is to split the $K$-sized data batch into $M$ distinct data chunks, form for each chunk the stochastic gradient, and update the solution estimate with respect to the stochastic gradient direction of the chunk that is medoid of gradients of all data-chunks. Under a general class of symmetric heavy-tailed gradient noises and a standard non-convex setting, we establish explicit bounds on the expected time-averaged squared gradient norm. More precisely, we show that the latter quantity converges at rate $\mathcal{O}(T^{-1})$ to a small neighborhood of zero; we explicitly characterize this neighborhood in terms of noise and algorithm's parameters. Moreover, if the time horizon is known in advance, we establish the rate of $\mathcal{O}(T^{-\frac{1}{2}}).$ Furthermore, when clipping is incorporated, we obtain convergence guaranties in the high-probability sense and recover the same rate. Experimental results indicate that R-SGD-Mini and its clipped variant consistently perform favorably compared to SGD, clipped SGD and Median-of-Means based methods.
Surrogate function and relaxed bias conditions enable communication-efficient iterations that also link to non-convex optimization.
abstractclick to expand
This paper investigates the distributed fixed point seeking problem of sum-separable stochastic operators over the multi-agent network. Based on inexact Krasnosel'ski\u{\i}--Mann iterations, the communication-efficient distributed algorithm is proposed under the relaxed growth bias and variance conditions, generalizing traditional unbiased and bounded additive variance assumptions. To enhance communication efficiency, we integrate communication compression and dynamic period skipping techniques, particularly adopting a unified compressor that allows both relative and absolute compression errors. By introducing a surrogate function for general non-contractive and contractive operators, we establish convergence guarantees of the distributed fixed point iteration, achieving among the first theoretical unifications with distributed non-convex optimization algorithms. Finally, numerical simulations validate the effectiveness of the theoretical results.
Both the level of conservativeness and the computational burden in robust optimization are critically influenced by uncertainty set design. However, contextual side information is rarely exploited in robust dispatch of power systems characterized by irregular data distributions, which hinders the explicit characterization of the relationship between covariates and uncertain parameters. To address this issue, a data-driven method for constructing contextual-aware uncertainty set is proposed in this letter. Based on a conditional Gaussian mixture model, a set of covariates is leveraged as side information to design uncertainty sets tailored to historical data exhibiting irregular distributions. The resulting set is formulated as a union-of-subsets formulation, and a mixed integer linear reformulation is adopted to describe the worst-case realization across all subsets. Finally, the effectiveness of the proposed method is demonstrated through numerical experiments applied to robust unit commitment.
This paper investigates the difference between the circular and elliptical cases in one-on-one pursuit and evasion problems. Using the simultaneous differential equation derived by Barton and Eliezer, we derive a dynamical system based on the assumption that the shape of the pursuer's trajectory is unaffected by the evader's speed. The dynamical system involves the angular difference between the velocity vectors of the players and their separation distance. When the evader orbits a circle, the dynamical system is autonomous with an asymptotically stable equilibrium point. By contrast, if the evader orbits an ellipse, the dynamical system becomes non-autonomous and lacks an equilibrium point. To handle the singularity at capture, we reformulate the system using a complex variable that includes information about the logarithmic distance and the angular difference. We establish two main results: when the pursuer is faster than the evader, the pursuer captures the evader in finite time, and we derive an explicit upper bound for the capture time; when the pursuer is slower, the system possesses a unique periodic solution to which all trajectories converge globally and asymptotically.
Characterizations via normal cone mappings show when small data changes preserve equilibrium sets in generalized games.
abstractclick to expand
Lagrangian generalized Nash equilibriums (LGNEs) were introduced by Rockafellar (2024) for a class of generalized Nash equilibrium problems (GNEPs) in which each player's strategy is subject to conic constraints. This paper investigates the stability properties of the LGNE solution set, specifically focusing on the Aubin property, isolated calmness, and Lipschitz continuous single-valued localization. For general conically constrained GNEPs, characterizations of the Aubin property and isolated calmness of the LGNE solution mapping under canonical perturbations are established. These characterizations are formulated using the coderivative and graph derivative of normal cone mappings. Subsequently, these general results are specialized to GNEPs with equality and inequality constraints, yielding explicit characterizations for both the Lipschitz continuous single-valued localization and isolated calmness of the corresponding LGNE solution mapping, which are described by nonsingularity of linear complementarity sytems. For GNEPs with shared conic constraints, the Aubin property and isolated calmness of the consensus LGNE solution mapping--where identical Lagrange multipliers are assigned to the shared constraint--are first characterized. We further analyze the case when the conic constraints are specialized as equalities and inequalities. Finally, for classical conically constrained Nash equilibrium problems, the Aubin property and isolated calmness of the Lagrangian Nash equilibrium solution mapping are also analyzed.
Quadratic approximations and a dynamic penalty keep subproblems strongly convex, delivering both expected and high-probability bounds on the
abstractclick to expand
This paper considers stochastic optimization problems with weakly convex objective and constraint functions. We propose Prox-PEP, a proximal method equipped with quadratic subproblems. To handle nonlinear equality constraints, we employ an exact penalty approach, transforming them into inequality constraints with auxiliary slack variables. At each iteration, we construct quadratic approximations for both the objective and the constraint functions to facilitate efficient subproblem computation. By carefully designing the second-order approximation matrices, the subproblem constructed via the augmented Lagrangian function is strictly guaranteed to be strongly convex. Furthermore, we adopt a dynamic strategy for the equality penalty parameter: it monotonically increases up to a predefined threshold and remains constant thereafter. Building upon this algorithmic framework, we establish comprehensive asymptotic complexities. We prove that Prox-PEP achieves an $\mathcal{O}(T^{-1/4})$ average expected oracle complexity for $\epsilon$-KKT stationarity, specifically bounding the squared norm of the gradient of the Moreau envelope of the Lagrangian function, alongside constraint violations and complementarity conditions. Additionally, under standard light-tailed martingale noise assumptions, we derive an $\mathcal{O}(T^{-1/8})$ high-probability convergence bound for the norm of the gradient of the Lagrangian's Moreau envelope, as well as $\mathcal{O}(T^{-1/4})$ high-probability bounds for both constraint violations and complementarity conditions.
A bilevel problem reduces to a standard mixed-integer program that matches or beats L1 regularization and recursive elimination on accuracy.
abstractclick to expand
This paper addresses feature subset selection for Support Vector Machines (SVMs) based on the cross-validation criterion. Unlike statistical criteria such as the Akaike information criterion (AIC) and the Bayesian information criterion (BIC), cross-validation requires only the mild assumption that samples are independently and identically distributed (i.i.d.). For this reason, the cross-validation criterion is expected to work well across a wide range of prediction problems, and it has already demonstrated its usefulness as a feature subset selection method for regression. The objective of this paper is to extend the framework of best feature subset selection via the cross-validation criterion to SVM classification problems. This subset-selection problem can be formulated as a bilevel mixed-integer optimization problem. Because bilevel optimization problems are generally hard to solve, we introduce the Least Squares Support Vector Machine (LS-SVM), whose optimality conditions admit a closed-form expression, and reduce the problem to a single-level mixed-integer optimization problem. This reformulation allows us to solve the problem using standard optimization software. We evaluate the proposed framework through simulation experiments that compare it with a regularization-based method (L1-regularization), a sequential search method (recursive feature elimination), and mixed-integer optimization (MIO) based on statistical criteria. The results show that the proposed framework achieves favorable performance both in classification accuracy and feature selection accuracy.
This paper details a novel indirect method for solving constrained optimal control problems (OCPs) directly in continuous-time function space. The KKT conditions are embedded in a non-smooth complementarity function, which enables their reformulation as a rootfinding problem in Banach space. This problem is then solved using a non-smooth Newton method. Finally, the paper shows that the Newton update can be obtained by solving a modified differential Riccati equation, where the cost terms are reweighted at every iteration based on the constraint multipliers. Numerical simulations show the effectiveness of the method, which converges superlinearly up to the tolerance of the ODE solver.
The rapid proliferation of distributed energy resources (DERs) and the electrification of residential loads offer significant potential for grid flexibility but pose stability challenges under static pricing regimes. Specifically, high levels of automation under static Time-of-Use (TOU) tariffs often induce ``device synchronization,'' where simultaneous responses from home energy management systems (HEMS) create artificial demand peaks that threaten grid stability. This paper proposes a privacy-preserving, one-way dynamic signaling framework to unlock deep demand flexibility from HEMS. We utilize a feedback-based learning algorithm that updates day-ahead price profiles based on aggregate substation demand and environmental contexts, effectively closing the loop between utility objectives and aggregated edge behaviors. The framework is rigorously validated using high-fidelity simulations on an 84-bus distribution network populated with hundreds of HEMS controlling diverse devices, including HVAC, PV, batteries, and flexible loads. Results demonstrate that the proposed mechanism achieves substantial reductions in both peak demand and total load variation. Extensive analyses across diverse climates and scalable deployments confirm the framework's robustness, indicating that dynamic pricing acts as a force multiplier for DERs, with peak shaving potential increasing significantly under high renewable penetration scenarios.
Usage-based insurance (UBI) uses telematics to align premiums with risk and encourage safe driving. However, deploying these programs is challenging due to heavy-tailed claim costs, nonstationary driver behavior, and limited incentive budgets. While existing research focuses on profiling drivers, prescriptive pricing remains underexplored. We propose an optimal control framework that integrates telematics directly into dynamic pricing. Our approach (i) learns claim frequency and severity, (ii) models multi-period behavioral evolution in response to discounts, and (iii) optimizes portfolio-wide discount allocation using a Lagrangian relaxation. This decomposes the non-convex centralized problem into independent dynamical systems. We theoretically prove this relaxation's duality gap vanishes as the portfolio scales, guaranteeing asymptotic optimality. We validate our approach computationally on a simulated industry-scale portfolio. Our results demonstrate not only the computational tractability of our approach but also that it outperforms static baselines, reducing both expected losses and claim probabilities to benefit insurers and policyholders alike.
An algorithm is proposed for solving optimization problems arising in neural network training for supervised learning. The unique feature of the algorithm is the use of an auxiliary loss, in addition to the original loss employed for model training. The purpose of the auxiliary loss is to provide a mechanism for creating a low-order Hessian-type approximation for the original loss. The proposed algorithm employs the resulting low-order second-derivative approximation terms in place of the second-order momentum terms (i.e., squared elements of the gradient of the loss function) in an overall scheme that has computational cost on par with an Adam-type approach. Whereas the squared elements of a gradient vector do not necessarily approximate second-order derivatives well, by careful construction of the auxiliary loss, second-order derivative-type approximations for the original loss can be computed and employed by the algorithm in an efficient manner. A convergence guarantee is provided for the proposed algorithm that is on par with guarantees available for similar stochastic diagonal-scaling methods. The results of numerical experiments show situations when the proposed algorithm outperforms Adam and other popular modern optimizers.
Most first-order optimizers treat matrix-valued parameters as vectors, ignoring the intrinsic geometry of hidden-layer weights in neural networks. Muon addresses this mismatch by updating along the polar factor of a momentum matrix, but its theoretical understanding has lagged behind practice. In particular, practical implementations incorporate Nesterov momentum, compute the polar factor only approximately, and operate with stochastic gradients that may be heavy-tailed. We close this gap by developing a convergence theory for Muon with Nesterov momentum and inexact polar decomposition in non-convex matrix optimization under heavy-tailed noise. Our analysis builds on a unified framework for inexact polar decomposition that captures practical iterative approximations such as Newton-Schulz and quantifies how their errors propagate through the optimization dynamics. Under this framework, we establish an optimal iteration and sample complexity of $O \left(\varepsilon^{\frac{-(3\alpha-2)}{(\alpha-1)}} \right)$ for finding an $\varepsilon$-stationary point, where $\alpha\in(1,2]$ denotes the heavy-tail index. For the inexact-polar setting with $\sigma_1=0$, we also provide guarantees that do not require prior knowledge of $\alpha$. We analyze a randomized low-rank polar decomposition that is substantially more efficient than full-space methods while remaining compatible with our theory. Numerical experiments further demonstrate the effectiveness of the proposed inexact and randomized variants.
We study the facial geometry of the homogeneous pseudo-moment cone \(\Sigma_{n,2d}^*\) and its implications for atomic decomposition of moment matrices. For fixed \(d \ge 2\), we show that if a moment matrix is formed by \(O(n^d)\) generically chosen weighted atoms, then its minimal face in the matrix realization of the pseudo-moment cone is \emph{simplicial} and generated by the planted rank-one atoms. Based on this geometric result, we develop a Carath\'eodory-type extreme-ray decomposition algorithm for spectrahedral cones and show that, when specialized to the pseudo-moment cone, it yields an efficient atomic decomposition method for generically generated moment matrices in the same regime. A stabilized numerical implementation demonstrates strong recovery performance and suggests that, outside the guaranteed regime, the algorithm may serve as a practical sampler of high-rank extreme rays.
MTRBO allocates separate regions for exploitation and exploration, proves global convergence, and beats prior methods on non-convex testbeds
abstractclick to expand
Bayesian Optimization (BO) is a popular framework for optimizing black-box functions. Despite its effectiveness, BO is often inefficient for high-dimensional problems due to the exponential growth of the search space, heterogeneity of the objective function, and low sampling budget. To overcome these issues, this work proposes a multiple trust region-based Bayesian optimization technique(MTRBO). A trust region is a localized region within which an optimization model is trusted to approximate the objective function accurately. Assuming a Gaussian process (GP) as a prior belief about the objective function and based on the posterior mean and variance functions, the method adaptively exploits near the promising current solution inside a trust region. Also explores the most uncertain region in the search space inside another trust region. The theoretical global convergence property of the proposed method is established. Then the work is benchmarked against other state-of-the-art trust-region-based Bayesian optimization algorithms, demonstrating superior performance on a variety of non-convex and high-dimensional test functions. The proposed method outperforms others in terms of solution quality within the sampling budget (the number of function evaluations). The proposed method is applied to the portfolio optimization problem to verify its applicability in real-world scenarios.
We study stochastic zeroth-order optimization with decision-dependent distributions, where the sampling law depends on the current decision and only noisy function values are available. For the non-smooth non-convex setting, we establish an explicit convergence guarantee for finding a $(\delta,\epsilon)$-Goldstein stationary point with stochastic zeroth-order oracle (SZO) complexity of $\mathcal{O}(d^2\delta^{-3}\epsilon^{-3})$. In addition, we show that the above complexity can be achieved with single SZO feedback per iteration. We further extend the analysis to smooth and Hessian-Lipschitz objectives, obtaining complexities $\mathcal{O}(d^2\epsilon^{-6})$ and $\mathcal{O}(d^2\epsilon^{-9/2})$, respectively. In the Hessian-Lipschitz case, this improves the best-known dependence on $\epsilon$ for decision-dependent zeroth-order methods by a factor of $\epsilon^{-1/2}$.
Benders decomposition (BD) is a widely used solution approach for solving two-stage stochastic programs arising in real-world decision-making under uncertainty. However, it often suffers from slow convergence as the master problem grows with an increasing number of cuts. In this paper, we propose Reinforcement Learning for BD (RLBD), a framework that adaptively selects cuts using a neural network-based stochastic policy. The policy is trained using a policy gradient method via the REINFORCE algorithm. We evaluate the proposed approach on a two-stage stochastic electric vehicle charging station location problem and compare it with vanilla BD and LearnBD, a supervised learning approach that classifies cuts using a support vector machine. Numerical results demonstrate that RLBD achieves substantial improvements in computational efficiency and exhibits strong generalization to problems with similar structures but varying data inputs and decision variable dimensions.
This work considers to achieve near-optimal operation for a class of batch processes by employing self-optimizing control (SOC). Comparing with a continuous one, a batch process exhibits stronger nonlinearity with dynamics because of the non-steady operation condition. This necessitates a global version of SOC to achieve satisfactory performance. Meanwhile, it also makes the existing global SOC (gSOC) not directly applicable to batch processes due to the causality amongst variables. Therefore, it is necessary to extend the original gSOC to batch processes. In addition to the nonconvexity challenge of the original gSOC problem, the new extension for batch processes has to face even more challenges. Particularly, the causality due to dynamics of batch processes brings in structural constraints on controlled variables (CVs), making a CV selection problem even more difficult. To address these challenges, the gSOC problem is recast in a vectorized formulation and it is proved that the structural constraints considered are linear in the vectorized formulation. Moreover, a novel shortcut method is proposed to efficiently find sub-optimal but more transparent solutions for this problem. The effectiveness of the new approach is validated through a case study of a fed-batch reactor, where CVs are constructed through a combination matrix with a repetitive structure, resulting in a simple SOC scheme. This simplicity facilitates the implementation of the SOC approach and enhances its practical applicability and robustness.
We study the state estimation problem for linear control systems with quadratic outputs which are locally unobservable at the equilibrium. We show that, despite this inherent lack of observability, an adversary with sensor read and write capability can induce observability by injecting an appropriate signal into the measurement channel. Taking the role of an adversary, we characterize when an injected signal can or cannot induce observability and, in the successful case, construct an observer that achieves local exponential convergence of state estimates to the true states of the system. A simulation study demonstrates our results.