通过显着图检测防御对抗攻击的防御

论文标题

通过显着图检测防御对抗攻击的防御

Detection Defense Against Adversarial Attacks with Saliency Map

论文作者

Ye, Dengpan, Chen, Chuanxi, Liu, Changrui, Wang, Hao, Jiang, Shunzhi

论文摘要

众所周知，神经网络容易受到对抗性例子的影响，这在人类的视野中几乎无法察觉，并且可能导致深层模型行为。这种现象可能导致在安全和保障关键应用中严重不可估量的后果。现有的防御能力是硬化模型针对对抗性攻击的鲁棒性的趋势，例如对抗性训练技术。但是，由于重新训练的高成本以及更改模型体系结构或参数的繁琐操作，这些通常对于实施而言是棘手的。在本文中，我们从增强模型可解释性的观点讨论了显着性图方法，类似于将注意力的机理引入模型，以便理解深层网络对象识别的进度。然后，我们提出了一种新颖的方法，并结合了其他噪音，并利用了不一致的策略来检测对抗性例子。我们对包括ImageNet和流行模型在内的常见数据集的一些代表性对抗攻击的实验结果表明，我们的方法可以有效地检测所有攻击，以有效地检测成功率。我们将其与现有的最新技术进行了比较，并且实验表明我们的方法更一般。

It is well established that neural networks are vulnerable to adversarial examples, which are almost imperceptible on human vision and can cause the deep models misbehave. Such phenomenon may lead to severely inestimable consequences in the safety and security critical applications. Existing defenses are trend to harden the robustness of models against adversarial attacks, e.g., adversarial training technology. However, these are usually intractable to implement due to the high cost of re-training and the cumbersome operations of altering the model architecture or parameters. In this paper, we discuss the saliency map method from the view of enhancing model interpretability, it is similar to introducing the mechanism of the attention to the model, so as to comprehend the progress of object identification by the deep networks. We then propose a novel method combined with additional noises and utilize the inconsistency strategy to detect adversarial examples. Our experimental results of some representative adversarial attacks on common datasets including ImageNet and popular models show that our method can detect all the attacks with high detection success rate effectively. We compare it with the existing state-of-the-art technique, and the experiments indicate that our method is more general.

下载PDF全文

下载文献需遵守相关版权规定

论文标题