A new 507-leaf taxonomy and 4x6 Target x Technique matrix audits six LLM attack benchmarks and finds they cover at most 25% of the threat surface with entire STRIDE categories untested.
Evasion Attacks against Machine Learning at Test Time , ISBN=
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
background 2representative citing papers
Presents first online L2D algorithm for multiclass classification with bandit feedback and varying experts, achieving O((n+n_e)T^{2/3}) regret generally and O((n+n_e)√T) under low noise.
AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference submissions.
Thesis uses statistical mechanics to study DAM and RBM models for understanding memorization, low-dimensional learning, and adversarial robustness in neural networks.
citing papers explorer
-
Talk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attacks
A new 507-leaf taxonomy and 4x6 Target x Technique matrix audits six LLM attack benchmarks and finds they cover at most 25% of the threat surface with entire STRIDE categories untested.
-
Online Learning-to-Defer with Varying Experts
Presents first online L2D algorithm for multiclass classification with bandit feedback and varying experts, achieving O((n+n_e)T^{2/3}) regret generally and O((n+n_e)√T) under low noise.
-
When AI reviews science: Can we trust the referee?
AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference submissions.
-
Explaining Machine Learning and Memorization with Statistical Mechanics
Thesis uses statistical mechanics to study DAM and RBM models for understanding memorization, low-dimensional learning, and adversarial robustness in neural networks.