通过快速梯度投影方法针对同义词替代基于基于同义词的文本攻击的对抗训练

论文标题

通过快速梯度投影方法针对同义词替代基于基于同义词的文本攻击的对抗训练

Adversarial Training with Fast Gradient Projection Method against Synonym Substitution based Text Attacks

论文作者

Wang, Xiaosen, Yang, Yichen, Deng, Yihe, He, Kun

论文摘要

对抗性训练是改善深神经网络进行图像分类的鲁棒性的最经验训练。但是，对于文本分类，现有的基于同义词的基于同义词替代的对抗性攻击是有效的，但不能有效地纳入实用的文本对抗训练中。由于词汇，语法和语义约束以及离散的文本输入空间，因此很难实现基于梯度的攻击，对于图像非常有效，对于基于同义词的文本攻击而言很难实现。因此，我们提出了一种基于同义替代的快速文本对抗攻击方法，称为“快速梯度投影方法”（FGPM），该方法的替代比现有文本攻击方法快20倍，并且可以实现类似的攻击性能。然后，我们将FGPM与对抗训练结合在一起，并提出了一种文本防御方法，称为“对抗训练”，通过logit配对（ATFL）增强了FGPM。实验表明，ATFL可以显着提高模型的鲁棒性并阻止对抗性实例的可传递性。

Adversarial training is the most empirically successful approach in improving the robustness of deep neural networks for image classification.For text classification, however, existing synonym substitution based adversarial attacks are effective but not efficient to be incorporated into practical text adversarial training. Gradient-based attacks, which are very efficient for images, are hard to be implemented for synonym substitution based text attacks due to the lexical, grammatical and semantic constraints and the discrete text input space. Thereby, we propose a fast text adversarial attack method called Fast Gradient Projection Method (FGPM) based on synonym substitution, which is about 20 times faster than existing text attack methods and could achieve similar attack performance. We then incorporate FGPM with adversarial training and propose a text defense method called Adversarial Training with FGPM enhanced by Logit pairing (ATFL). Experiments show that ATFL could significantly improve the model robustness and block the transferability of adversarial examples.

下载PDF全文

下载文献需遵守相关版权规定

论文标题