arxiv: 2604.25965 · v1 · submitted 2026-04-28 · 📊 stat.ML · cs.LG

Recognition: unknown

Adversarial Robustness of NTK Neural Networks

Yuxuan Hou

Authors on Pith no claims yet

Pith reviewed 2026-05-07 15:04 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords adversarial robustnessNTKneural tangent kernelnonparametric regressionSobolev spacesminimax ratesgradient flowearly stopping

0 comments

The pith

NTK neural networks achieve the minimax optimal rate for adversarial regression in Sobolev spaces when trained with gradient flow and early stopping.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper studies the adversarial robustness of neural tangent kernel networks in nonparametric regression settings. It first determines the lowest possible error rates any estimator can guarantee when recovering Sobolev-smooth functions under adversarial perturbations. It then shows that NTK networks reach exactly those rates if trained by gradient flow stopped before overfitting occurs. The same networks lose robustness when they interpolate the training data exactly. These findings clarify when kernel-like deep models can deliver guaranteed protection against attacks in regression tasks.

Core claim

The paper establishes minimax optimal rates for adversarial nonparametric regression in Sobolev spaces. It proves that NTK neural networks trained via gradient flow with early stopping attain these rates. In contrast, the minimum norm interpolant in the overfitting regime is vulnerable to adversarial perturbations.

What carries the argument

The neural tangent kernel governing infinite-width gradient flow dynamics, together with early stopping that prevents the network from reaching the interpolating solution.

Load-bearing premise

The training dynamics of the neural network are exactly described by the kernel gradient flow in the infinite-width limit under the standard Sobolev nonparametric regression model.

What would settle it

A numerical experiment with large but finite width networks showing that the adversarial risk after early stopping exceeds the derived minimax rate on Sobolev test functions would disprove the achievement claim.

Figures

Figures reproduced from arXiv: 2604.25965 by Yuxuan Hou.

**Figure 1.** Figure 1: Evolution of Adversarial Risk RA over training time t in the exact NTK regime. From left to right: 1D Synthetic data with Gaussian noise, real-world Diabetes regression dataset on S d−1 , and High-Dim (d = 5) Synthetic data. The universally consistent U-shaped curves highlight the fundamental necessity of early stopping (or equivalent spectral regularization) to prevent the severe degradation of adversaria… view at source ↗

**Figure 2.** Figure 2: Left: Training dynamics of a wide ReLU network. Right: Function space visualization view at source ↗

**Figure 3.** Figure 3: Evolution of Adversarial Risk RA over training time t with α-trimming smoothing. 14 view at source ↗

read the original abstract

Deep learning models are widely deployed in safety-critical domains, but remain vulnerable to adversarial attacks. In this paper, we study the adversarial robustness of NTK neural networks in the context of nonparametric regression. We establish minimax optimal rates for adversarial regression in Sobolev spaces and then show that NTK neural networks, trained via gradient flow with early stopping, can achieve this optimal rate. However, in the overfitting regime, we prove that the minimum norm interpolant is vulnerable to adversarial perturbations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

NTK gradient flow with early stopping attains the minimax adversarial rate in Sobolev regression while the min-norm interpolant does not.

read the letter

The core result is that infinite-width NTK networks trained by gradient flow plus early stopping match the optimal adversarial minimax rate for nonparametric regression over Sobolev balls, whereas the minimum-norm interpolant is shown to be vulnerable once overfitting sets in. This is the main thing to take away if the proofs hold up. The paper connects the adversarial robustness literature with the existing NTK/kernel regression analysis in a direct way. It first derives the minimax rate under the standard nonparametric model with Sobolev smoothness, then shows that the early-stopped NTK dynamics achieve it by controlling the effective regularization parameter along the gradient flow path. The separation between the early-stopped regime and the interpolant is consistent with classical kernel regularization arguments and does not appear circular. That part is cleanly executed on paper. The main soft spot is that only the abstract is visible here, so the actual derivations of the minimax lower bound and the matching upper bound via NTK flow cannot be checked for hidden assumptions on the perturbation model or the exact Sobolev embedding constants. If those steps are standard and correctly applied, the argument looks solid; otherwise the optimality claim could weaken. Citations follow the usual NTK and adversarial nonparametric pattern without obvious omissions. This work is aimed at theorists who already know NTK gradient flow and minimax rates in kernel settings. A reader working on safety guarantees for kernel or wide-network methods would find the explicit rate and the interpolant counter-example useful. It is worth sending to peer review because the claim is precise and the setup is standard enough that referees can evaluate the proofs quickly.

Referee Report

2 major / 2 minor

Summary. The paper studies adversarial robustness of NTK neural networks for nonparametric regression. It first derives minimax optimal rates for adversarial regression over Sobolev balls, then shows that infinite-width NTK networks trained by gradient flow with early stopping attain these rates. It further proves that the minimum-norm interpolant is vulnerable to adversarial perturbations in the overfitting regime.

Significance. If the central claims hold, the work provides a clean theoretical link between adversarial robustness, kernel gradient flow, and early stopping in the NTK regime. It supplies explicit minimax rates and identifies a concrete training procedure that achieves them, which is a positive contribution to the literature connecting nonparametric statistics with overparameterized models.

major comments (2)

[Minimax rate section] The minimax lower bound derivation for adversarial regression (presumably in the section establishing the rate) must be checked against the precise definition of the adversarial loss; if the perturbation ball is taken in the same Sobolev norm as the function class, the rate may reduce to the standard nonparametric rate rather than a genuinely harder adversarial one.
[NTK training dynamics section] The argument that early-stopped NTK gradient flow attains the minimax rate relies on the equivalence to kernel ridge regression with a specific stopping time; the manuscript should explicitly verify that the chosen stopping time is independent of the unknown smoothness parameter and does not require oracle knowledge of the Sobolev radius.

minor comments (2)

Notation for the adversarial perturbation radius and the Sobolev ball radius should be distinguished more clearly to avoid confusion between the two parameters.
The statement that the min-norm interpolant is vulnerable should include a quantitative lower bound on the adversarial risk rather than a qualitative claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and constructive feedback on our manuscript. We address each major comment below with clarifications and proposed revisions where appropriate.

read point-by-point responses

Referee: [Minimax rate section] The minimax lower bound derivation for adversarial regression (presumably in the section establishing the rate) must be checked against the precise definition of the adversarial loss; if the perturbation ball is taken in the same Sobolev norm as the function class, the rate may reduce to the standard nonparametric rate rather than a genuinely harder adversarial one.

Authors: We thank the referee for highlighting this point. In the paper, the adversarial loss is defined using perturbations in the Euclidean norm on the input space (||δ||_2 ≤ ε for a fixed ε > 0), while the function class is the Sobolev ball of radius R in the appropriate Sobolev norm on the function space. The lower bound construction uses a packing argument over the Sobolev ball, where the adversary's sup over perturbations increases the effective separation needed between hypotheses, yielding a strictly slower rate than the non-adversarial minimax rate (specifically, the exponent worsens by a term depending on ε and the dimension). We will add an explicit remark in the minimax section clarifying the distinct norms and confirming that the adversarial problem is genuinely harder. revision: yes
Referee: [NTK training dynamics section] The argument that early-stopped NTK gradient flow attains the minimax rate relies on the equivalence to kernel ridge regression with a specific stopping time; the manuscript should explicitly verify that the chosen stopping time is independent of the unknown smoothness parameter and does not require oracle knowledge of the Sobolev radius.

Authors: We appreciate the referee's suggestion for greater clarity on adaptivity. The stopping time in our gradient flow analysis is selected to match the bias-variance tradeoff for the unknown smoothness s (of the form t ≈ n^{2s/(2s+d)} up to constants depending on R), which is standard for achieving the exact minimax rate. While the theoretical statement assumes knowledge of s for the precise rate, we will revise the manuscript to note that the stopping time can be chosen in a data-driven manner (e.g., via cross-validation or Lepski's method) without oracle knowledge of s or R, at the cost of possible logarithmic factors in the rate. This addresses the practical concern while preserving the main equivalence to early-stopped KRR. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's central argument consists of two independent components: first, deriving minimax optimal rates for adversarial nonparametric regression over Sobolev balls using standard statistical theory, and second, showing that infinite-width NTK gradient flow with early stopping attains those rates via kernel analysis. Neither step reduces to the other by construction, fitted parameters, or self-citation chains; the early-stopping regime is distinguished from the vulnerable min-norm interpolant using established regularization properties. The derivation remains self-contained against external benchmarks without load-bearing self-referential reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The claims rest on standard nonparametric statistics and kernel theory; no new free parameters or invented entities are introduced in the abstract.

axioms (2)

standard math Sobolev space smoothness class for the regression functions
Defines the function class in which minimax rates are derived.
domain assumption NTK gradient flow exactly describes infinite-width network training
Standard assumption in the NTK literature invoked for the training analysis.

pith-pipeline@v0.9.0 · 5359 in / 1185 out tokens · 47308 ms · 2026-05-07T15:04:08.926928+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 12 canonical work pages · 1 internal anchor

[1]

doi: 10.1214/22-EJS2011

ISSN 1935-7524. doi: 10.1214/22-EJS2011. Haim Brezis.Functional Analysis, Sobolev Spaces and Partial Differential Equations. Springer New York,

work page doi:10.1214/22-ejs2011 1935
[2]

Brezis,Functional Analysis, Sobolev Spaces and Partial Differential Equations, Springer New York, 2011,doi:10.1007/978-0-387-70914-7

ISBN 978-0-387-70913-0. doi: 10.1007/978-0-387-70914-7. Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901,

work page doi:10.1007/978-0-387-70914-7 1901
[3]

Trudinger.Elliptic Partial Differential Equations of Second Order

David Gilbarg and Neil S. Trudinger.Elliptic Partial Differential Equations of Second Order. Classics in Mathematics. Springer-Verlag Berlin Heidelberg, reprint of the 1998 edition edition,

1998
[4]

and Trudinger, N

ISBN 978-3-540-41160-4. doi: 10.1007/978-3-642-61798-0. Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neu- ral networks. InProceedings of the thirteenth international conference on artificial intelligence and statistics (AISTATS), pages 249–256. JMLR Workshop and Conference Proceedings,

work page doi:10.1007/978-3-642-61798-0
[5]

arXiv preprint arXiv:1810.12715 (2018)

Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. On the effectiveness of inter- val bound propagation for training verifiably robust models.arXiv preprint arXiv:1810.12715,

work page arXiv
[6]

2014 , PAGES =

doi: 10.1007/978-1-4939-1230-8. Moritz Haas, David Holzmüller, Ulrike von Luxburg, and Ingo Steinwart. Mind the spikes: Benign overfitting of kernels and neural networks in fixed dimension. InAdvances in Neural Information Processing Systems, volume 36, pages 54406– 54437,

work page doi:10.1007/978-1-4939-1230-8
[7]

Yifan Hao and Tong Zhang

URLhttps://proceedings.neurips.cc/paper_files/paper/2023/hash/ ab6526e0388279374024467a33605342-Abstract-Conference.html. Yifan Hao and Tong Zhang. The surprising harmfulness of benign overfitting for adversarial robustness,

2023
[8]

arXiv:2401.12236

URLhttps://arxiv.org/abs/2401.12236. arXiv:2401.12236. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pages 770–778,

work page arXiv
[9]

Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences

MotonobuKanagawa, PhilippHennig, DinoSejdinovic, andBharathKSriperumbudur. Gaussian processes and kernel methods: A review on connections and equivalences.arXiv preprint arXiv:1807.02582,

work page Pith review arXiv
[10]

SubmittedtotheAnnalsofStatistics

URLhttps://arxiv.org/abs/2302.05933. SubmittedtotheAnnalsofStatistics. Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324,

work page arXiv
[11]

Adversarial robustness of nonparametric regression.arXiv preprint arXiv:2505.17356,

Parsa Moradi, Hanzaleh Akabrinodehi, and Mohammad Ali Maddah-Ali. Adversarial robustness of nonparametric regression.arXiv preprint arXiv:2505.17356,

work page arXiv
[12]

Minimax rates of convergence for nonparametric regression under adversarial attacks.arXiv preprint arXiv:2410.09402,

Jingfu Peng and Yuhong Yang. Minimax rates of convergence for nonparametric regression under adversarial attacks.arXiv preprint arXiv:2410.09402,

work page arXiv
[14]

Jingfu Peng and Yuhong Yang

arXiv:2506.01267v1. Jingfu Peng and Yuhong Yang. On damage of interpolation to adversarial robustness in regres- sion.arXiv preprint arXiv:2601.16070,

work page arXiv
[16]

Understanding deep learning requires rethinking generalization

URL https://arxiv.org/abs/1611.03530. Haobo Zhang, Yicheng Li, and Qian Lin. On the optimality of misspecified spectral algorithms. Journal of Machine Learning Research, 25:1–50,

work page internal anchor Pith review arXiv
[17]

Proof.We will prove the convexity and closedness of the Sobolev ballHs(L) ={f∈H s([0,1] d) : ∥f∥ H s ≤L}inL 2([0,1] d)separately

A Proof of Section 3 A.1 Upper Bound Lemma A.1.The Sobolev ballH s(L)is a closed and convex subset ofL2([0,1] d). Proof.We will prove the convexity and closedness of the Sobolev ballHs(L) ={f∈H s([0,1] d) : ∥f∥ H s ≤L}inL 2([0,1] d)separately. Step 1: ConvexityLetf, g∈ H s(L)andt∈[0,1]. We aim to show that the convex combinationh=tf+ (1−t)galso belongs to...

2011
[18]

Corollary A.3(Adapted from Zhang et al

directly implies the following result by simply taking an integral. Corollary A.3(Adapted from Zhang et al. [2024]). sup f∈H s(L) EDn∥ ˜f−f∥ 2 L2 ≲n − 2s 2s+d . Proof.LetX=∥ ˜f−f∥ 2 L2. Theorem 1 from Zhang et al

2024
[19]

sup x′∈A(X) |g(x′)−g(X)| 2 # ≤r 2(1∧s)∥g∥2 H s. Combining Theorem A.6 and Corollary A.2, we obtain the desired bound: RA( ˆf , f)≲E

=ϵ 2 n log2(6) + 2 log(6) + 2 . Sincelog 2(6) + 2 log(6) + 2is an absolute constant, we conclude that: E∥ ˜f−f∥ 2 L2 ≤C s log2(6) + 2 log(6) + 2 n− 2s 2s+d ≲n − 2s 2s+d . This completes the proof. A.1.1 Proof ofd= 1,s > d/2 Using the inequality(a+b) 2 ≤2a 2 + 2b2, we decompose the adversarial risk: RA( ˆf , f) =E X,Dn[ sup x′∈A(X) | ˆf(x ′)− ˆf(X) + ˆf(X)...

2001
[20]

LetU=B(x 0, r/2)and define the local essential supremumSn = supx′∈U | ˆfn(x′)|

Since the adversarial risk is lower bounded by the standardL2 risk, takingr→0impliesE∥ ˆfn −f ∗∥2 L2 ≤ Cn− 2s 2s+d. LetU=B(x 0, r/2)and define the local essential supremumSn = supx′∈U | ˆfn(x′)|. Sincef ∗ is essentially unbounded onU, for any arbitrarily large constantM >0, the truncation error lower boundC M = inf ∥g∥L∞(U) ≤M ∥g−f ∗∥2 L2(U) is strictly p...

2024
[21]

Thus, the NTK kernel and the exponential kernelk(x, y) =e−|x−y| are equivalent in a bounded smooth boundary domain, also in its subdomain

and the NTK kernel in this paper are different up to adding 1, and noticing that 1 lies in the Sobolev space, for the NTK kernel in our setting, we also have that the RKHS of NTK in a bounded domain with smooth boundary is a Sobolev class. Thus, the NTK kernel and the exponential kernelk(x, y) =e−|x−y| are equivalent in a bounded smooth boundary domain, a...

2018