Adversarial robustness as a prior for learned representations

Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Brandon Tran, Aleksander Madry · 1906 · arXiv 1906.00945

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

Toy Models of Superposition

cs.LG · 2022-09-21 · accept · novelty 8.0

Toy models demonstrate that polysemanticity arises when neural networks store more sparse features than neurons via superposition, producing a phase transition tied to polytope geometry and increased adversarial vulnerability.

Adjoint Inversion Reveals Holographic Superposition and Destructive Interference in CNN Classifiers

cs.CV · 2026-04-30 · unverdicted · novelty 7.0

CNN classifiers work by holographic superposition and destructive interference in pixel space rather than selecting cleaned features, as proven by a new adjoint inversion framework that also yields a covariance-volume channel selection algorithm.

Laundering AI Authority with Adversarial Examples

cs.CR · 2026-05-05 · unverdicted · novelty 5.0

Adversarial examples enable AI authority laundering by causing production VLMs to give authoritative but wrong responses on subtly perturbed images, with success rates of 22-100% using decade-old attack methods.

citing papers explorer

Showing 3 of 3 citing papers.

Toy Models of Superposition cs.LG · 2022-09-21 · accept · none · ref 14
Toy models demonstrate that polysemanticity arises when neural networks store more sparse features than neurons via superposition, producing a phase transition tied to polytope geometry and increased adversarial vulnerability.
Adjoint Inversion Reveals Holographic Superposition and Destructive Interference in CNN Classifiers cs.CV · 2026-04-30 · unverdicted · none · ref 5
CNN classifiers work by holographic superposition and destructive interference in pixel space rather than selecting cleaned features, as proven by a new adjoint inversion framework that also yields a covariance-volume channel selection algorithm.
Laundering AI Authority with Adversarial Examples cs.CR · 2026-05-05 · unverdicted · none · ref 21
Adversarial examples enable AI authority laundering by causing production VLMs to give authoritative but wrong responses on subtly perturbed images, with success rates of 22-100% using decade-old attack methods.

Adversarial robustness as a prior for learned representations

fields

years

verdicts

representative citing papers

citing papers explorer