Local LMO is a new projection-free method that achieves the convergence rates of projected gradient descent for constrained optimization by using local linear minimization oracles over small balls.
Title resolution pending
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 7representative citing papers
DPO-RLHF equivalence holds only conditionally on the optimal policy preferring human-preferred responses; otherwise DPO optimizes relative advantage and can prefer worse outputs, addressed by introducing CPO.
The Sinkhorn treatment effect is a new entropic optimal transport measure of divergence between counterfactual distributions that admits first- and second-order pathwise differentiability, debiased estimators, and asymptotically valid tests for distributional treatment effects.
Inkheart SGD and M4 use bidirectional compression to achieve time complexities in distributed SGD that improve with worker count n and surpass prior lower bounds under a necessary structural assumption.
The Multi-Block DC class admits polynomial-size DC decompositions for problems that require exponential size under standard DC programming and supplies explicit constructive formulations for deep ReLU networks together with convergent batch and stochastic algorithms.
Listwise Policy Optimization explicitly performs target-projection on the LLM response simplex, unifying and improving group-based RLVR methods with monotonic improvement and flexible divergences.
The paper motivates stochastic optimization problems from statistical perspectives and describes offline and online approaches to solve expectation minimization problems.
citing papers explorer
-
Local LMO: Constrained Gradient Optimization via a Local Linear Minimization Oracle
Local LMO is a new projection-free method that achieves the convergence rates of projected gradient descent for constrained optimization by using local linear minimization oracles over small balls.
-
Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment
DPO-RLHF equivalence holds only conditionally on the optimal policy preferring human-preferred responses; otherwise DPO optimizes relative advantage and can prefer worse outputs, addressed by introducing CPO.
-
Sinkhorn Treatment Effects: A Causal Optimal Transport Measure
The Sinkhorn treatment effect is a new entropic optimal transport measure of divergence between counterfactual distributions that admits first- and second-order pathwise differentiability, debiased estimators, and asymptotically valid tests for distributional treatment effects.
-
Scalable Distributed Stochastic Optimization via Bidirectional Compression: Beyond Pessimistic Limits
Inkheart SGD and M4 use bidirectional compression to achieve time complexities in distributed SGD that improve with worker count n and surpass prior lower bounds under a necessary structural assumption.
-
The Multi-Block DC Function Class: Theory, Algorithms, and Applications
The Multi-Block DC class admits polynomial-size DC decompositions for problems that require exponential size under standard DC programming and supplies explicit constructive formulations for deep ReLU networks together with convergent batch and stochastic algorithms.
-
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex
Listwise Policy Optimization explicitly performs target-projection on the LLM response simplex, unifying and improving group-based RLVR methods with monotonic improvement and flexible divergences.
-
Stochastic Optimization and Data Science
The paper motivates stochastic optimization problems from statistical perspectives and describes offline and online approaches to solve expectation minimization problems.