Two steps of gradient descent on first-layer weights in linear-width two-layer networks produce a spiked random matrix with floor(alpha2/(1/2-alpha1)) outliers, each a learned direction, and batch reuse allows capturing directions with information exponent exceeding one.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
CASA achieves 93.9% slide-level accuracy on Camelyon17-WILDS by adversarially augmenting stains in Macenko space with DKW-calibrated coverage, outperforming baselines including in worst-group accuracy.
citing papers explorer
-
Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent
Two steps of gradient descent on first-layer weights in linear-width two-layer networks produce a spiked random matrix with floor(alpha2/(1/2-alpha1)) outliers, each a learned direction, and batch reuse allows capturing directions with information exponent exceeding one.
-
Physics-Grounded Adversarial Stain Augmentation with Calibrated Coverage Guarantees
CASA achieves 93.9% slide-level accuracy on Camelyon17-WILDS by adversarially augmenting stains in Macenko space with DKW-calibrated coverage, outperforming baselines including in worst-group accuracy.