SIAM review , volume=

Optimization methods for large-scale machine learning , author= · 2018

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

browse 8 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

Scalable Distributed Stochastic Optimization via Bidirectional Compression: Beyond Pessimistic Limits

math.OC · 2026-05-08 · unverdicted · novelty 7.0

Inkheart SGD and M4 use bidirectional compression to achieve time complexities in distributed SGD that improve with worker count n and surpass prior lower bounds under a necessary structural assumption.

Locally Near Optimal Piecewise Linear Regression in High Dimensions via Difference of Max-Affine Functions

stat.ML · 2026-05-07 · unverdicted · novelty 7.0

ABGD parametrizes piecewise linear functions as difference of max-affine functions and converges linearly to an epsilon-accurate solution with O(d max(sigma/epsilon,1)^2) samples under sub-Gaussian noise, which is minimax optimal up to logs.

UB-SMoE: Universally Balanced Sparse Mixture-of-Experts for Resource-adaptive Federated Fine-tuning of Foundation Models

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

UB-SMoE balances expert utilization in heterogeneous federated SMoE fine-tuning via Dynamic Modulated Routing and Universal Pseudo-Gradient, delivering up to 45% compute reduction and 8.7x performance gains for low-resource clients over prior LoRA-rank methods.

Holder Policy Optimisation

cs.LG · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

HölderPO unifies token-level aggregation in GRPO via the Hölder mean with a tunable p parameter and annealing schedule, delivering 54.9% average accuracy on math benchmarks and 93.8% success on ALFWorld.

Refining Covariance Matrix Estimation in Stochastic Gradient Descent Through Bias Reduction

stat.ML · 2026-04-23 · unverdicted · novelty 6.0

A novel bias-reduced online covariance estimator for SGD achieves convergence rate n to the power (α-1)/2 times square root of log n without second-order derivatives.

AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments

cs.LG · 2026-05-01 · unverdicted · novelty 5.0

AdaMeZO adapts Adam moment estimates to zeroth-order LLM fine-tuning without extra memory storage, outperforming MeZO with up to 70% fewer forward passes.

Implicit Neural Optimal Transport via Fixed-Point Optimization

math.OC · 2026-05-11

Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization

cs.CR · 2026-05-06

citing papers explorer

Showing 8 of 8 citing papers.

Scalable Distributed Stochastic Optimization via Bidirectional Compression: Beyond Pessimistic Limits math.OC · 2026-05-08 · unverdicted · none · ref 183
Inkheart SGD and M4 use bidirectional compression to achieve time complexities in distributed SGD that improve with worker count n and surpass prior lower bounds under a necessary structural assumption.
Locally Near Optimal Piecewise Linear Regression in High Dimensions via Difference of Max-Affine Functions stat.ML · 2026-05-07 · unverdicted · none · ref 217
ABGD parametrizes piecewise linear functions as difference of max-affine functions and converges linearly to an epsilon-accurate solution with O(d max(sigma/epsilon,1)^2) samples under sub-Gaussian noise, which is minimax optimal up to logs.
UB-SMoE: Universally Balanced Sparse Mixture-of-Experts for Resource-adaptive Federated Fine-tuning of Foundation Models cs.LG · 2026-05-15 · unverdicted · none · ref 38
UB-SMoE balances expert utilization in heterogeneous federated SMoE fine-tuning via Dynamic Modulated Routing and Universal Pseudo-Gradient, delivering up to 45% compute reduction and 8.7x performance gains for low-resource clients over prior LoRA-rank methods.
Holder Policy Optimisation cs.LG · 2026-05-12 · unverdicted · none · ref 33 · 2 links
HölderPO unifies token-level aggregation in GRPO via the Hölder mean with a tunable p parameter and annealing schedule, delivering 54.9% average accuracy on math benchmarks and 93.8% success on ALFWorld.
Refining Covariance Matrix Estimation in Stochastic Gradient Descent Through Bias Reduction stat.ML · 2026-04-23 · unverdicted · none · ref 11
A novel bias-reduced online covariance estimator for SGD achieves convergence rate n to the power (α-1)/2 times square root of log n without second-order derivatives.
AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments cs.LG · 2026-05-01 · unverdicted · none · ref 30
AdaMeZO adapts Adam moment estimates to zeroth-order LLM fine-tuning without extra memory storage, outperforming MeZO with up to 70% fewer forward passes.
Implicit Neural Optimal Transport via Fixed-Point Optimization math.OC · 2026-05-11 · unreviewed · ref 28
Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization cs.CR · 2026-05-06 · unreviewed · ref 33

SIAM review , volume=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer