Deep Text Classification Can be Fooled

Bin Liang; Hongcheng Li; Miaoqiang Su; Pan Bian; Wenchang Shi; Xirong Li

arxiv: 1704.08006 · v2 · pith:OECDCGEGnew · submitted 2017-04-26 · 💻 cs.CR · cs.LG

Deep Text Classification Can be Fooled

Bin Liang , Hongcheng Li , Miaoqiang Su , Pan Bian , Xirong Li , Wenchang Shi This is my paper

classification 💻 cs.CR cs.LG

keywords adversarialsamplestextattackclassificationclassifiersdnn-basedimportant

0 comments

read the original abstract

In this paper, we present an effective method to craft text adversarial samples, revealing one important yet underestimated fact that DNN-based text classifiers are also prone to adversarial sample attack. Specifically, confronted with different adversarial scenarios, the text items that are important for classification are identified by computing the cost gradients of the input (white-box attack) or generating a series of occluded test samples (black-box attack). Based on these items, we design three perturbation strategies, namely insertion, modification, and removal, to generate adversarial samples. The experiment results show that the adversarial samples generated by our method can successfully fool both state-of-the-art character-level and word-level DNN-based text classifiers. The adversarial samples can be perturbed to any desirable classes without compromising their utilities. At the same time, the introduced perturbation is difficult to be perceived.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Vulnerability of Natural Language Classifiers to Evolutionary Generated Adversarial Text
cs.AI 2026-06 unverdicted novelty 6.0

GAversary, a black-box genetic algorithm with GloVe-based mutations, generates adversarial examples that reduce NLP model accuracy more than BAE or A2T on benchmarks while perturbing more words.