pith. machine review for the scientific record. sign in

arxiv: 1801.04354 · v5 · submitted 2018-01-13 · 💻 cs.CL · cs.CR· cs.IR· cs.LG

Recognition: unknown

Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers

Authors on Pith no claims yet
classification 💻 cs.CL cs.CRcs.IRcs.LG
keywords textblack-boxdeepwordbugadversarialattacksaveragebeenclassification
0
0 comments X
read the original abstract

Although various techniques have been proposed to generate adversarial samples for white-box attacks on text, little attention has been paid to black-box attacks, which are more realistic scenarios. In this paper, we present a novel algorithm, DeepWordBug, to effectively generate small text perturbations in a black-box setting that forces a deep-learning classifier to misclassify a text input. We employ novel scoring strategies to identify the critical tokens that, if modified, cause the classifier to make an incorrect prediction. Simple character-level transformations are applied to the highest-ranked tokens in order to minimize the edit distance of the perturbation, yet change the original classification. We evaluated DeepWordBug on eight real-world text datasets, including text classification, sentiment analysis, and spam detection. We compare the result of DeepWordBug with two baselines: Random (Black-box) and Gradient (White-box). Our experimental results indicate that DeepWordBug reduces the prediction accuracy of current state-of-the-art deep-learning models, including a decrease of 68\% on average for a Word-LSTM model and 48\% on average for a Char-CNN model.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Analyzing the Effect of Noise in LLM Fine-tuning

    cs.LG 2026-04 unverdicted novelty 5.0

    Label noise hurts fine-tuning performance most while grammatical and typographical noise sometimes act as mild regularizers, with changes concentrated in task-specific layers.