论文标题
通过标签平滑在和外部文本对抗性鲁棒性
In and Out-of-Domain Text Adversarial Robustness via Label Smoothing
论文作者
论文摘要
最近,已经显示,最新的NLP模型容易受到对抗性攻击的影响,在这种攻击中,模型的预测可以通过对输入的轻微修改(例如同义词替代品)进行巨大改变。尽管已经提出了几种防御技术,并适应了文本对抗性攻击的离散性质,但尚未研究通用正规化方法(例如标签平滑语言模型)的好处。在本文中,我们研究了基础模型中各种标签平滑策略提供的对抗性鲁棒性,用于内域和外域设置中的各种NLP任务。我们的实验表明,标签平滑显着改善了伯特(Bert)等预先训练的模型中的对抗性鲁棒性,以针对各种流行的攻击。我们还分析了预测置信度和鲁棒性之间的关系,表明标签平滑降低了对抗性示例的过度自信错误。
Recently it has been shown that state-of-the-art NLP models are vulnerable to adversarial attacks, where the predictions of a model can be drastically altered by slight modifications to the input (such as synonym substitutions). While several defense techniques have been proposed, and adapted, to the discrete nature of text adversarial attacks, the benefits of general-purpose regularization methods such as label smoothing for language models, have not been studied. In this paper, we study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks in both in-domain and out-of-domain settings. Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks. We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.