On Adversarial Examples for Character-Level Neural Machine Translation

arxiv: 1806.09030 · v1 · pith:V7RMG3FXnew · submitted 2018-06-23 · 💻 cs.CL · cs.AI

On Adversarial Examples for Character-Level Neural Machine Translation

Javid Ebrahimi , Daniel Lowd , Dejing Dou This is my paper

classification 💻 cs.CL cs.AI

keywords adversarialexamplesblack-boxrobustnesstranslationwhite-boxcharacter-levelmachine

0 comments p. Extension

pith:V7RMG3FX Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{V7RMG3FX}

Prints a linked pith:V7RMG3FX badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

Evaluating on adversarial examples has become a standard procedure to measure robustness of deep learning models. Due to the difficulty of creating white-box adversarial examples for discrete text input, most analyses of the robustness of NLP models have been done through black-box adversarial examples. We investigate adversarial examples for character-level neural machine translation (NMT), and contrast black-box adversaries with a novel white-box adversary, which employs differentiable string-edit operations to rank adversarial changes. We propose two novel types of attacks which aim to remove or change a word in a translation, rather than simply break the NMT. We demonstrate that white-box adversarial examples are significantly stronger than their black-box counterparts in different attack scenarios, which show more serious vulnerabilities than previously known. In addition, after performing adversarial training, which takes only 3 times longer than regular training, we can improve the model's robustness significantly.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Scaling Laws for Reward Model Overoptimization
cs.LG 2022-10 unverdicted novelty 6.0

Synthetic measurements show that gold-standard performance degrades according to distinct functional forms when optimizing proxy reward models via RL or best-of-n, with coefficients scaling smoothly by reward model pa...