论文标题
随机合奏的对抗脆弱性
Adversarial Vulnerability of Randomized Ensembles
论文作者
论文摘要
尽管深层神经网络在各种任务中取得了巨大的成功,但它们对不可察觉的对抗扰动的脆弱性阻碍了他们在现实世界中的部署。最近,与随机合奏的作品相比,具有最小的计算开销的标准对抗训练(AT)模型,对对抗性训练(AT)模型的对抗性鲁棒性有了显着改善,这使它们成为安全关键资源限制的应用程序的有前途的解决方案。但是,这种令人印象深刻的表现提出了一个问题:这些稳健性是否由随机合奏真实?在这项工作中,我们从理论和经验上都解决了这个问题。从理论上讲,我们首先确定通常采用的鲁棒性评估方法(例如自适应PGD)在这种情况下提供了错误的安全感。随后,我们提出了一种理论上有效的对抗攻击算法(ARC),即使在自适应PGD无法做到这一点的情况下,也能损害随机合奏。我们在各种网络体系结构,培训方案,数据集和规范上进行了全面的实验,以支持我们的主张,并经验确定,随机合奏实际上比在模型上更容易受到$ \ ell_p $结合的对抗性扰动的影响。我们的代码可以在https://github.com/hsndbk4/arc上找到。
Despite the tremendous success of deep neural networks across various tasks, their vulnerability to imperceptible adversarial perturbations has hindered their deployment in the real world. Recently, works on randomized ensembles have empirically demonstrated significant improvements in adversarial robustness over standard adversarially trained (AT) models with minimal computational overhead, making them a promising solution for safety-critical resource-constrained applications. However, this impressive performance raises the question: Are these robustness gains provided by randomized ensembles real? In this work we address this question both theoretically and empirically. We first establish theoretically that commonly employed robustness evaluation methods such as adaptive PGD provide a false sense of security in this setting. Subsequently, we propose a theoretically-sound and efficient adversarial attack algorithm (ARC) capable of compromising random ensembles even in cases where adaptive PGD fails to do so. We conduct comprehensive experiments across a variety of network architectures, training schemes, datasets, and norms to support our claims, and empirically establish that randomized ensembles are in fact more vulnerable to $\ell_p$-bounded adversarial perturbations than even standard AT models. Our code can be found at https://github.com/hsndbk4/ARC.