super hub Mixed citations

Towards Deep Learning Models Resistant to Adversarial Attacks

Adrian Vladu, Aleksandar Makelov, Aleksander Madry, Dimitris Tsipras, Ludwig Schmidt · 2017 · stat.ML · arXiv 1706.06083

Mixed citation behavior. Most common role is background (68%).

162 Pith papers citing it

Background 68% of classified citations

open full Pith review browse 162 citing papers more from Adrian Vladu arXiv PDF

abstract

Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples---inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models. Code and pre-trained models are available at https://github.com/MadryLab/mnist_challenge and https://github.com/MadryLab/cifar10_challenge.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 22 method 6

citation-polarity summary

background 19 use method 6 unclear 3

claims ledger

abstract Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples---inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us t

authors

Adrian Vladu Aleksandar Makelov Aleksander Madry Dimitris Tsipras Ludwig Schmidt

co-cited works

representative citing papers

On the Generation and Mitigation of Harmful Geometry in Image-to-3D Models

cs.CR · 2026-05-10 · conditional · novelty 8.0

Image-to-3D models successfully generate harmful geometries in most cases with under 0.3% caught by commercial filters; existing safeguards are weak but a stacked defense cuts harmful outputs to under 1% at 11% false-positive cost.

Local LMO: Constrained Gradient Optimization via a Local Linear Minimization Oracle

math.OC · 2026-05-09 · unverdicted · novelty 8.0

Local LMO is a new projection-free method that achieves the convergence rates of projected gradient descent for constrained optimization by using local linear minimization oracles over small balls.

Fortifying Time Series: DTW-Certified Robust Anomaly Detection

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

First DTW-certified robust anomaly detection for time series via randomized smoothing adapted through an l_p-to-DTW lower-bound transformation.

Uncovering and Understanding FPR Manipulation Attack in Industrial IoT Networks

cs.CR · 2026-01-20 · unverdicted · novelty 8.0

FPR manipulation attack perturbs benign MQTT packets to flip labels to attacks in NIDS with 80-100% success, increasing SOC delays without gradient-based methods.

A Classifier-Agnostic Zero-Shot Adversarial Attack Detection via CLIP

cs.CV · 2026-06-29 · unverdicted · novelty 7.0 · 2 refs

A^4D detects adversarial attacks in an attack- and classifier-agnostic way by measuring non-arbitrary shifts in CLIP embedding space from prompt-based similarity scores.

Shoot the Honey, Cloak the Player: Towards Zero-Runtime-Overhead Proactive Defense and Detection for Visual Game Cheating

cs.CR · 2026-06-24 · unverdicted · novelty 7.0

AimTrap is an end-to-end system using Adversarial Camouflage Textures (ACT) and Adversarial Honeypot Textures (AHT) synthesized via differentiable rendering to defend against and detect visual aimbots, with reported success rates of 85.1% and 96.9% and negligible overhead.

Accelerated and Stable Convergence with Anchored Optimistic Method

math.OC · 2026-06-19 · unverdicted · novelty 7.0

GOMA achieves optimal last-iterate O(1/k²) convergence in deterministic monotone Lipschitz VIs and O(1/√k) in stochastic unbounded-variance settings without variance reduction.

Robustness Verification of Recurrent Neural Networks with Abstraction Refinement

cs.LG · 2026-06-10 · unverdicted · novelty 7.0

Abstraction-refinement framework with SHAP-guided timestep selection improves certified robustness verification success and margin tightness for RNNs over abstraction-only baselines.

Improving Adversarial Transferability on Vision-Language Pre-training Models via Surrogate-Specific Bias Correction

cs.CV · 2026-06-09 · unverdicted · novelty 7.0

DeBias-Attack corrects surrogate-specific bias in adversarial gradients for VLP models by subtracting the projection from a reference branch optimized on weak-semantic images.

Adversarial Robustness of Activation Steering in Large Language Models

cs.LG · 2026-06-05 · unverdicted · novelty 7.0

First systematic test shows activation steering robustness drops sharply (up to 64%) under adversarial input perturbations across multiple extraction methods, models, and personas.

Anti-Hyperspectral Anomaly Detection: A First Study on Stealthy Lipschitz-Forcing Perturbations Against Unknown Detectors

eess.IV · 2026-06-03 · unverdicted · novelty 7.0

Develops the first AHAD method using ARAB regularization and Lipschitz-forcing perturbations to produce one energy-efficient signal that evades multiple unknown benchmark HAD detectors.

Beyond False Stability: High-Noise Drift Gating for Test-Time Adversarial Defenses in Vision-Language Models

cs.CV · 2026-06-02 · unverdicted · novelty 7.0

High-noise feature drift distinguishes adversarial from clean inputs in CLIP, allowing a plug-in gating mechanism to selectively trigger existing test-time defenses and raise mean clean+adversarial accuracy across 13 datasets.

When Interpretability Becomes a Liability: Adversarial Attacks on CBM Concept Layers

cs.LG · 2026-05-25 · unverdicted · novelty 7.0

Concept-level adversarial attacks exploit CBM interpretability on the CUB dataset, but SPECTRA raises required perturbation norm from 0.46 to over 4200 while keeping accuracy loss under 2.2%.

Where Detectors Fail: Probing Generative Space for Generalizable AI-Generated Image Detection

cs.CV · 2026-05-24 · unverdicted · novelty 7.0

PROBE improves AIGI detector generalization to unseen generators by using the detector as a critic to steer manifold-level modifications that produce challenging training samples.

Codec-Robust Attacks on Audio LLMs

cs.SD · 2026-05-19 · unverdicted · novelty 7.0 · 2 refs

CodecAttack perturbs audio in codec latent space with multi-bitrate EoT to achieve 85.5% average ASR on Opus-compressed Audio LLMs versus under 26% for waveform baselines, with transfer to MP3 and AAC.

Understanding Dynamics of Adam in Zero-Sum Games: An ODE Approach

cs.LG · 2026-05-19 · unverdicted · novelty 7.0

Derives ODE limits of Adam-DA showing that first- and second-order momentum parameters reverse their convergence roles in zero-sum games compared to minimization, validated on GAN experiments.

Stress-Testing Neural Network Verifiers with Provably Robust Instances

cs.LG · 2026-05-16 · conditional · novelty 7.0

A reusable framework generates verification instances with provably known robustness labels, revealing numeric tolerance issues and bugs in five verifiers while introducing difficulty profiles to diagnose failure modes.

AIM: Adversarial Information Masking for Faithfulness Evaluation of Saliency Maps

cs.LG · 2026-05-16 · unverdicted · novelty 7.0

AIM is a new saliency-guided adversarial feature replacement method to evaluate faithfulness of saliency maps and reliability of masking operators on image, audio, and EEG tasks.

AuraMask: An Extensible Pipeline for Developing Aesthetic Anti-Facial Recognition Image Filters

cs.CV · 2026-05-13 · conditional · novelty 7.0

AuraMask produces 40 aesthetic anti-facial recognition filters that match or exceed prior adversarial effectiveness and achieve significantly higher user acceptance in a 630-person study.

GaitProtector: Impersonation-Driven Gait De-Identification via Training-Free Diffusion Latent Optimization

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

GaitProtector optimizes diffusion model latents to impersonate target identities in gait sequences, dropping Rank-1 identification accuracy from 89.6% to 15.0% on CASIA-B while keeping scoliosis diagnostic accuracy at 74.2%.

Fix the Loss, Not the Radius: Rethinking the Adversarial Perturbation of Sharpness-Aware Minimization

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

LE-SAM inverts SAM by fixing the loss budget instead of the parameter-space radius, yielding better generalization across benchmarks.

Inference Time Causal Probing in LLMs

cs.AI · 2026-05-08 · unverdicted · novelty 7.0

HDMI is a new probe-free technique that steers LLM hidden states via margin objectives to achieve more reliable causal interventions than prior probe-based methods on standard benchmarks.

Minimum Specification Perturbation: Robustness as Distance-to-Falsification in Causal Inference

stat.ME · 2026-05-02 · unverdicted · novelty 7.0

MSP quantifies the minimum changes to analyst choices required to falsify a causal claim by making its confidence interval contain zero, providing information orthogonal to dispersion-based robustness summaries.

Quantum Interval Bound Propagation for Certified Training of Quantum Neural Networks

quant-ph · 2026-05-01 · unverdicted · novelty 7.0

QIBP adapts interval bound propagation to quantum neural networks for certified adversarial robustness via interval and affine arithmetic implementations.

citing papers explorer

Showing 50 of 162 citing papers.

First-Order Methods for Solving Convex (Strongly) Concave Minimax Problems with Functional Constraints math.OC · 2026-06-17 · unverdicted · none · ref 282 · internal anchor
PALM achieves Õ(ε^{-1}) first-order complexity for ε-KKT points in convex-strongly-concave minimax problems with functional constraints and Õ(ε^{-3/2}) for the dual in the convex-concave case.
TS-Fault: Benchmarking Time Series Forecasters Against Structural Faults cs.LG · 2026-06-16 · unverdicted · none · ref 18 · internal anchor
TS-Fault benchmark finds clean-data accuracy anti-correlates with robustness to structural faults, with all catastrophic failures under mechanism-level faults and foundation models most fragile.
MorphStrata: Layer-Specific Perturbations for Generating Morphence Students in Time-Series Moving Target Defense cs.LG · 2026-06-16 · unverdicted · none · ref 11 · internal anchor
MorphStrata generates heterogeneous student models via layer-specific perturbations in a Transformer-based Morphence MTD setup, reporting RMSE gains up to 24% and 98% on AEP data under FGSM and BIM attacks with under 1% training time increase.
Reinforcement Learning Disrupts Gradient-Based Adversarial Optimization cs.LG · 2026-06-10 · unverdicted · none · ref 42 · internal anchor
RL training disrupts gradient-based adversarial attacks by inducing unstable low-magnitude gradients that limit the effectiveness of methods like PGD within practical budgets.
Small Data, Big Noise: Adversarial Training for Robust Parameter-Efficient Fine-Tuning cs.CL · 2026-06-09 · unverdicted · none · ref 12 · internal anchor
SDBN introduces adversarial training to PEFT via two variants using character-level edits and LLM-generated perturbations, claiming improved robustness and generalization on NLP benchmarks in low-resource noisy settings.
Defending Against Malicious Finetuning by Scaling Train-time Adversarial Attacks cs.CL · 2026-06-06 · unverdicted · none · ref 16 · internal anchor
Patcher improves LLM robustness to malicious full-parameter finetuning by scaling train-time adversarial attacks in a bi-level optimization loop and supplies an efficient parallel implementation.
Stress-testing medical large language models reveals latent safety pathology beyond benchmark accuracy cs.AI · 2026-06-06 · unverdicted · none · ref 34 · internal anchor
A new stress-testing framework for medical LLMs reveals hidden safety failures in quantized and medically fine-tuned models that standard benchmarks miss.
RedEdit: Agentic Red-Teaming of Image Safety Classifiers via MCTS-Guided Photo-Editing cs.CR · 2026-06-04 · unverdicted · none · ref 1 · internal anchor
RedEdit finds that fewer than two photo edits on average let 76.2% of unsafe images evade detectors while retaining 93.0% of malicious semantics.
Measuring Model Robustness via Fisher Information: Spectral Bounds, Theoretical Guarantees, and Practical Algorithms cs.LG · 2026-06-03 · unverdicted · none · ref 74 · internal anchor
Proposes spectral norm of Fisher Information Matrix as attack-agnostic robustness metric with closed-form bounds for common architectures and correlation to adversarial vulnerability.
Exploring Adversarial Robustness and Safety Alignment in Multilingual Multi-Modal Large Language Models cs.CL · 2026-06-02 · unverdicted · none · ref 28 · internal anchor
Adversarial images transfer across languages in MLLMs while apparent safety in weaker languages stems from comprehension and visual-grounding failures rather than genuine alignment.
Sensitivity as a Double-Edged Sword: A Trade-off Between Discriminability and Adversarial Robustness cs.CV · 2026-06-01 · unverdicted · none · ref 33 · internal anchor
Identifies sensitivity as the source of both discriminability and vulnerability in FC classifiers versus robustness in l2 classifiers, and introduces HPM prototype fusion plus MSA evaluation to improve adversarial robustness.
RoboStressBench: Benchmarking VLM Robustness to Physical Visual Stress in Embodied Scenes cs.CV · 2026-05-30 · unverdicted · none · ref 21 · internal anchor
RoboStressBench decomposes visual stress into four physically grounded dimensions to benchmark VLM robustness in embodied scenes and proposes a stress-aware solver.
Benchmarking Bilevel Derivative-Free Optimization Algorithms math.OC · 2026-05-28 · unverdicted · none · ref 46 · internal anchor
Introduces a refereeing procedure and full computational cost accounting to improve benchmarking fairness for bilevel derivative-free optimization algorithms.
Landseer: Exploring the Machine Learning Defense Landscape cs.CR · 2026-05-26 · unverdicted · none · ref 64 · internal anchor
Landseer offers a containerized modular system to integrate and evaluate combinations of machine learning defenses, with an initial analysis of 35 defenses highlighting replicability challenges.
Closed-Loop Bidirectional Prompting for Adversarial Robustness of Vision Language Models cs.CV · 2026-05-25 · unverdicted · none · ref 24 · internal anchor
Introduces Closed-Loop Bidirectional Prompting with Semantic Anchor for cross-modal agreement recovery, claiming SOTA adversarial robustness and generalization on 11 datasets.
Certified Robustness from Approximate Gaussian Mixture Structures in Pretrained Latent Spaces cs.LG · 2026-05-25 · unverdicted · none · ref 12 · internal anchor
Approximate Gaussian mixture structure in pretrained latent spaces yields certified robustness with graceful degradation bounds.
Why SGD is not Brownian Motion: A New Perspective on Stochastic Dynamics cs.LG · 2026-05-21 · unverdicted · none · ref 101 · internal anchor
SGD is reformulated via a master equation from discrete updates, producing a discrete Fokker-Planck equation that predicts non-stationary variance growth proportional to learning rate in flat Hessian directions.
Attention Hijacking: Response Manipulation Across Queries in Vision-Language Models cs.CV · 2026-05-17 · unverdicted · none · ref 26 · internal anchor
Attention Hijacking is a new attack that improves cross-query transferability in VLMs by explicitly steering internal attention to a persistent image-dominant pattern.
Compositional Adversarial Training for Robust Visual Watermarking cs.CV · 2026-05-16 · unverdicted · none · ref 2 · internal anchor
CAT trains watermark detectors against adaptive compositional adversaries using differentiable attack selection, yielding up to 63.5% capacity gains on hard attacks versus random-augmentation baselines.
Right Predictions, Misleading Explanations: On the Vulnerability of Vision-Language Model Explanations cs.CV · 2026-05-15 · unverdicted · none · ref 7 · 2 links · internal anchor
X-Shift is a grey-box attack that perturbs patch-level visual features in VLMs to shift explanation heatmaps without changing the predicted output.
DarkLLM: Learning Language-Driven Adversarial Attacks with Large Language Models cs.CR · 2026-05-15 · unverdicted · none · ref 40 · internal anchor
DarkLLM trains an LLM to generate language-driven adversarial perturbations that unify targeted, untargeted, segmentation, and multi-model attacks on foundation models.
REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations cs.CL · 2026-05-12 · unverdicted · none · ref 155 · internal anchor
REALISTA generates semantically coherent adversarial prompts via latent-space optimization over input-dependent editing directions, achieving stronger hallucination elicitation than prior realistic attacks on open-source and reasoning LLMs.
Fair Conformal Classification via Learning Representation-Based Groups cs.LG · 2026-05-12 · unverdicted · none · ref 6 · internal anchor
A fair conformal classification method guarantees conditional coverage on adaptively identified subgroups defined via learned representations.
Seir\^enes: Adversarial Self-Play with Evolving Distractions for LLM Reasoning cs.AI · 2026-05-12 · unverdicted · none · ref 34 · internal anchor
Seirênes trains LLMs via adversarial self-play to generate and overcome evolving distractions, producing gains of 7-10 points on math reasoning benchmarks and exposing blind spots in larger models.
Guaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothing cs.CR · 2026-05-11 · unverdicted · none · ref 40 · internal anchor
DR-Smoothing introduces a disrupt-then-rectify prompt processing scheme into smoothing defenses, delivering tight theoretical bounds on success probability against both token- and prompt-level jailbreaks.
"Training robust watermarking model may hurt authentication!'' Exploring and Mitigating the Identity Leakage in Robust Watermarking cs.CR · 2026-05-10 · unverdicted · none · ref 63 · internal anchor
W-IR is the first watermarking framework to combine certified robustness via randomized smoothing in pixel and coordinate spaces with identity leakage mitigation via residual information loss minimization.
Efficient Verification of Neural Control Barrier Functions with Smooth Nonlinear Activations cs.LG · 2026-05-08 · unverdicted · none · ref 35 · internal anchor
LightCROWN computes tighter Jacobian bounds for neural networks with smooth nonlinear activations by exploiting their analytical properties, raising verification success rates for neural control barrier functions up to 100% on benchmark control systems.
Beyond Defenses: Manifold-Aligned Regularization for Intrinsic 3D Point Cloud Robustness cs.CV · 2026-05-08 · unverdicted · none · ref 20 · internal anchor
MAPR improves adversarial robustness in 3D point cloud networks by aligning latent predictions with intrinsic manifold geometry via curvature/diffusion features and a consistency loss.
Uncovering Hidden Systematics in Neural Network Models for High Energy Physics cs.LG · 2026-05-08 · unverdicted · none · ref 6 · internal anchor
Neural networks for HEP tasks can be fooled at significant rates by subtle perturbations inside uncertainty envelopes, revealing hidden systematics not captured by conventional methods.
Band Together: Untargeted Adversarial Training with Multimodal Coordination against Evasion-based Promotion Attacks cs.LG · 2026-05-07 · unverdicted · none · ref 10 · internal anchor
UAT-MC improves defense against evasion promotion attacks in multimodal recommenders by aligning gradients across modalities during untargeted adversarial training.
Redefining AI Red Teaming in the Agentic Era: From Weeks to Hours cs.AI · 2026-05-05 · unverdicted · none · ref 2 · internal anchor
An agentic red teaming system automates creation of adversarial testing workflows from natural language goals, unifying ML and generative AI attacks and achieving 85% success rate on Meta Llama Scout with no custom human code.
Detecting Adversarial Data via Provable Adversarial Noise Amplification cs.LG · 2026-05-04 · unverdicted · none · ref 19 · internal anchor
A provable adversarial noise amplification theorem under sufficient conditions enables a custom-trained detector that identifies adversarial examples at inference time using enhanced layer-wise noise signals.
Stability and Generalization for Decentralized Markov SGD cs.LG · 2026-05-03 · unverdicted · none · ref 6 · internal anchor
Decentralized SGD and SGDA under Markovian sampling admit non-asymptotic generalization bounds that incorporate network topology, Markov mixing rates, and primal-dual dynamics.
LocalAlign: Enabling Generalizable Prompt Injection Defense via Generation of Near-Target Adversarial Examples for Alignment Training cs.CR · 2026-05-02 · unverdicted · none · ref 27 · internal anchor
LocalAlign generates near-target adversarial examples via prompting and applies margin-aware alignment training to enforce tighter boundaries against prompt injection attacks.
VisInject: Disruption != Injection -- A Dual-Dimension Evaluation of Universal Adversarial Attacks on Vision-Language Models cs.CR · 2026-05-02 · conditional · none · ref 20 · internal anchor
Universal adversarial attacks cause output perturbation 90 times more often than precise target injection in VLMs, with only 2 verbatim successes out of 6615 tests.
The Power of Order: Fooling LLMs with Adversarial Table Permutations cs.LG · 2026-05-01 · unverdicted · none · ref 32 · 2 links · internal anchor
Semantically invariant row and column permutations in tables can cause LLMs to output incorrect answers, and a gradient-based attack called ATP efficiently finds such permutations that degrade performance across many models.
Defending Quantum Classifiers against Adversarial Perturbations through Quantum Autoencoders quant-ph · 2026-04-30 · unverdicted · none · ref 9 · internal anchor
A quantum autoencoder purifies adversarial perturbations for quantum classifiers and supplies a confidence score for unrecoverable inputs, claiming up to 68% accuracy gains over prior defenses without adversarial training.
Controlled Steering-Based State Preparation for Adversarial-Robust Quantum Machine Learning quant-ph · 2026-04-30 · unverdicted · none · ref 21 · internal anchor
A passive steering method for quantum state preparation improves adversarial accuracy in QML models by up to 40% across tested cases.
When AI reviews science: Can we trust the referee? cs.AI · 2026-04-26 · unverdicted · none · ref 65 · internal anchor
AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference submissions.
Transferable Physical-World Adversarial Patches Against Pedestrian Detection Models cs.CV · 2026-04-24 · unverdicted · none · ref 49 · internal anchor
TriPatch generates transferable physical adversarial patches via multi-stage triplet loss, appearance consistency, and data augmentation to achieve higher attack success rates on pedestrian detectors than prior methods.
FastAT Benchmark: A Comprehensive Framework for Fair Evaluation of Fast Adversarial Training Methods cs.CV · 2026-04-22 · conditional · none · ref 1 · internal anchor
The FastAT Benchmark standardizes evaluation of over twenty fast adversarial training methods under unified conditions, showing that well-designed single-step approaches can match or exceed PGD-AT robustness at lower training cost on CIFAR-10, CIFAR-100, and Tiny-ImageNet.
If you're waiting for a sign... that might not be it! Mitigating Trust Boundary Confusion from Visual Injections on Vision-Language Agentic Systems cs.CV · 2026-04-21 · unverdicted · none · ref 28 · internal anchor
LVLM-based agents exhibit trust boundary confusion with visual injections and a multi-agent defense separating perception from decision-making reduces misleading responses while preserving correct ones.
Representation-Guided Parameter-Efficient LLM Unlearning cs.CL · 2026-04-19 · unverdicted · none · ref 66 · internal anchor
REGLU guides LoRA-based unlearning via representation subspaces and orthogonal regularization to outperform prior methods on forget-retain trade-off in LLM benchmarks.
Latent Instruction Representation Alignment: defending against jailbreaks, backdoors and undesired knowledge in LLMs cs.LG · 2026-04-12 · unverdicted · none · ref 25 · internal anchor
LIRA aligns latent instruction representations in LLMs to defend against jailbreaks, backdoors, and undesired knowledge, blocking over 99% of PEZ attacks and achieving optimal WMDP forgetting.
Quantum Patches: Enhancing Robustness of Quantum Machine Learning Models quant-ph · 2026-04-09 · unverdicted · none · ref 9 · internal anchor
Random quantum circuits used as adversarial training data reduce successful attack rates on QML models for CIFAR-10 from 89.8% to 68.45% and for CINIC-10 from 94.23% to 78.68%.
Compression as an Adversarial Amplifier Through Decision Space Reduction cs.CV · 2026-04-08 · unverdicted · none · ref 30 · internal anchor
Compression acts as an adversarial amplifier by reducing the decision space of image classifiers, making attacks in compressed representations substantially more effective than pixel-space attacks under the same perturbation budget.
Stealthy and Adjustable Text-Guided Backdoor Attacks on Multimodal Pretrained Models cs.CR · 2026-04-07 · unverdicted · none · ref 28 · internal anchor
Introduces a text-guided backdoor attack using common textual words as triggers and visual perturbations for stealthy, adjustable control on multimodal pretrained models.
Agent-Sentry: Bounding LLM Agents via Execution Provenance cs.CR · 2026-03-24 · unverdicted · none · ref 21 · internal anchor
Agent-Sentry bounds LLM agent executions via structural provenance classification, sensitive-value allowlists, and selective LLM judgment, blocking 94.3% of injections while allowing 95.1% of benign actions on AgentDojo and AgentDyn.
Shapes are not enough: CONSERVAttack and its use for finding vulnerabilities and uncertainties in machine learning applications cs.LG · 2026-03-14 · unverdicted · none · ref 9 · internal anchor
CONSERVAttack creates adversarial perturbations in HEP ML models that respect uncertainty bounds but cause misclassifications, revealing gaps in current validation practices.
Causally Sufficient and Necessary Feature Expansion for Class-Incremental Learning cs.LG · 2026-03-10 · unverdicted · none · ref 45 · internal anchor
CPNS regularization with dual counterfactual generators mitigates intra-task and inter-task spurious correlations in class-incremental learning feature expansion.

Towards Deep Learning Models Resistant to Adversarial Attacks

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer