关于深度神经网络的最小对抗扰动，可证明的估计误差

论文标题

关于深度神经网络的最小对抗扰动，可证明的估计误差

On the Minimal Adversarial Perturbation for Deep Neural Networks with Provable Estimation Error

论文作者

Brau, Fabio, Rossolini, Giulio, Biondi, Alessandro, Buttazzo, Giorgio

论文摘要

尽管深度神经网络（DNN）在感知和控制任务中表现出了令人难以置信的表现，但仍有几个值得信赖的问题仍在开放。讨论最多的主题之一是存在对抗性扰动的存在，该扰动已经为能够量化给定输入的鲁棒性的可证明技术开辟了一个有趣的研究线。在这方面，输入与分类边界的欧几里得距离表示良好的鲁棒性评估是最小的负担得起的对抗性扰动。不幸的是，由于NNS的非凸性性质，计算这样的距离是高度复杂的。尽管已经提出了几种方法来解决这个问题，但据我们所知，尚未提出可证明的结果来估算和约束所犯的错误。本文通过提出两种轻巧的策略来解决最小的对抗扰动，从而解决了这个问题。与最先进的方法不同，所提出的方法允许对理论相对于近似距离的误差估计理论提出误差估计理论。最后，据报道，大量实验可以评估算法的性能并支持理论发现。获得的结果表明，所提出的策略近似靠近分类边界的样品的理论距离，从而可证明可证明具有对任何对抗性攻击的鲁棒性。

Although Deep Neural Networks (DNNs) have shown incredible performance in perceptive and control tasks, several trustworthy issues are still open. One of the most discussed topics is the existence of adversarial perturbations, which has opened an interesting research line on provable techniques capable of quantifying the robustness of a given input. In this regard, the Euclidean distance of the input from the classification boundary denotes a well-proved robustness assessment as the minimal affordable adversarial perturbation. Unfortunately, computing such a distance is highly complex due the non-convex nature of NNs. Despite several methods have been proposed to address this issue, to the best of our knowledge, no provable results have been presented to estimate and bound the error committed. This paper addresses this issue by proposing two lightweight strategies to find the minimal adversarial perturbation. Differently from the state-of-the-art, the proposed approach allows formulating an error estimation theory of the approximate distance with respect to the theoretical one. Finally, a substantial set of experiments is reported to evaluate the performance of the algorithms and support the theoretical findings. The obtained results show that the proposed strategies approximate the theoretical distance for samples close to the classification boundary, leading to provable robustness guarantees against any adversarial attacks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题