arXiv preprint arXiv:2003.00307 , year=

Toward a theory of optimization for over-parameterized systems of non-linear equations: the lessons of deep learning , author= · 2003 · arXiv 2003.00307

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

A Theory on Flow Matching with Neural Networks

cs.LG · 2026-06-08 · unverdicted · novelty 6.0

Establishes convergence guarantees for overparameterized 2-layer ReLU networks in flow matching, generalization bounds for the velocity-field objective, and Wasserstein guarantees for generated samples, using multi-task representation learning bounds.

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

cs.LG · 2024-01-02 · unverdicted · novelty 6.0

SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.

On the Impact of Class Imbalance on the Learning Dynamics of Deep Neural Networks:An Intuitive Insight

cs.LG · 2026-05-24 · unverdicted · novelty 3.0

Class imbalance causes DNNs to underfit minority classes early in training and produce non-generalizable minority representations later by overfitting to minimize overall loss.

citing papers explorer

Showing 3 of 3 citing papers.

A Theory on Flow Matching with Neural Networks cs.LG · 2026-06-08 · unverdicted · none · ref 269
Establishes convergence guarantees for overparameterized 2-layer ReLU networks in flow matching, generalization bounds for the velocity-field objective, and Wasserstein guarantees for generated samples, using multi-task representation learning bounds.
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models cs.LG · 2024-01-02 · unverdicted · none · ref 251
SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.
On the Impact of Class Imbalance on the Learning Dynamics of Deep Neural Networks:An Intuitive Insight cs.LG · 2026-05-24 · unverdicted · none · ref 38
Class imbalance causes DNNs to underfit minority classes early in training and produce non-generalizable minority representations later by overfitting to minimize overall loss.

arXiv preprint arXiv:2003.00307 , year=

fields

years

verdicts

representative citing papers

citing papers explorer