A^4D detects adversarial attacks in an attack- and classifier-agnostic way by measuring non-arbitrary shifts in CLIP embedding space from prompt-based similarity scores.
Entropy -Based Non -Invasive Reliability Monitoring of Convolutional Neural Networks
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Introduces Bipredictability P with a provable bound P ≤ 0.5 from entropy subadditivity, showing responsive agency imposes an informational cost by suppressing P to ~0.33, validated across RL agents and other systems, plus an IDT architecture outperforming reward monitoring.
citing papers explorer
-
A Classifier-Agnostic Zero-Shot Adversarial Attack Detection via CLIP
A^4D detects adversarial attacks in an attack- and classifier-agnostic way by measuring non-arbitrary shifts in CLIP embedding space from prompt-based similarity scores.