通过渐近正态分布学习的强烈可转移的对抗攻击

论文标题

通过渐近正态分布学习的强烈可转移的对抗攻击

Strong Transferable Adversarial Attacks via Ensembled Asymptotically Normal Distribution Learning

论文作者

Fang, Zhengwei, Wang, Rui, Huang, Tao, Jing, Liping

论文摘要

强烈的对手实例对于评估和增强深神经网络的鲁棒性至关重要。但是，流行攻击的性能通常是敏感的，例如，由于信息有限而引起的次要图像转换 - 通常只有一个输入示例，少数几个白色盒子源模型以及未定义的防御策略。因此，精心设计的对抗性例子很容易过度拟合源模型，从而使其转移性缩减为未知建筑。在本文中，我们提出了一种名为多个渐近正态分布攻击（Multianda）的方法，该方法明确表征了从学习分布中的对抗性扰动。具体而言，我们通过利用随机梯度上升（SGA）的渐近正态性能（SGA）的渐近态性特性来近似扰动，然后采用深层集成策略作为在此过程中贝叶斯边缘化的有效代理，旨在估算高斯人的混合，从而促进对潜在优化空间的更彻底的探索。近似的后验实质上描述了SGA迭代的固定分布，该分布捕获了局部最佳最佳的几何信息。因此，Multianda允许为每个输入绘制无限数量的对抗扰动，并可靠地保持可转移性。我们提出的方法通过对七个正常训练的七个防御模型进行大量实验，超过了对具有或没有防御的深度学习模型的十个最先进的黑盒攻击。

Strong adversarial examples are crucial for evaluating and enhancing the robustness of deep neural networks. However, the performance of popular attacks is usually sensitive, for instance, to minor image transformations, stemming from limited information -- typically only one input example, a handful of white-box source models, and undefined defense strategies. Hence, the crafted adversarial examples are prone to overfit the source model, which hampers their transferability to unknown architectures. In this paper, we propose an approach named Multiple Asymptotically Normal Distribution Attacks (MultiANDA) which explicitly characterize adversarial perturbations from a learned distribution. Specifically, we approximate the posterior distribution over the perturbations by taking advantage of the asymptotic normality property of stochastic gradient ascent (SGA), then employ the deep ensemble strategy as an effective proxy for Bayesian marginalization in this process, aiming to estimate a mixture of Gaussians that facilitates a more thorough exploration of the potential optimization space. The approximated posterior essentially describes the stationary distribution of SGA iterations, which captures the geometric information around the local optimum. Thus, MultiANDA allows drawing an unlimited number of adversarial perturbations for each input and reliably maintains the transferability. Our proposed method outperforms ten state-of-the-art black-box attacks on deep learning models with or without defenses through extensive experiments on seven normally trained and seven defense models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题