论文标题

您的分布检测方法不强大!

Your Out-of-Distribution Detection Method is Not Robust!

论文作者

Azizmalayeri, Mohammad, Moakhar, Arshia Soltani, Zarei, Arman, Zohrabi, Reihaneh, Manzuri, Mohammad Taghi, Rohban, Mohammad Hossein

论文摘要

由于识别出可靠性和安全性方面样本的重要性,由于分布外(OOD)检测最近引起了大量关注。尽管OOD检测方法已经大大推进了,但它们仍然容易受到对抗例子的影响,这违反了其目的。为了减轻此问题,最近提出了一些防御。然而,这些努力仍然无效,因为它们的评估是基于小扰动大小或较弱的攻击。在这项工作中,我们将这些防御措施重新检查,以防止端到端的PGD攻击较大的扰动大小,例如对于CIFAR-10数据集,最多可用于常用的$ε= 8/255 $。令人惊讶的是,几乎所有这些防御能力都比在对抗环境下的随机检测差。接下来,我们旨在提供强大的OOD检测方法。在理想的防御中,训练应将模型暴露于几乎所有可能的对抗扰动中,这可以通过对抗训练来实现。也就是说,这种培训扰动应基于分布样本和分布样本。因此,与标准设置中的OOD检测不同,对OOD的访问以及分布方式不同,在对抗训练设置中,样本听起来必不可少。这些技巧使我们采用生成的OOD检测方法,例如Opengan,作为基线。随后,我们提出了经过对抗训练的鉴别器(ATD),该歧视器(ATD)利用预先训练的强大模型来提取强大的功能,并创建生成器模型来创建OOD样本。将ATD与CIFAR-10和CIFAR-100用作分布数据,我们可以在坚固的AUROC中显着胜过所有先前的方法,同时保持高标准的AUROC和分类精度。代码存储库可在https://github.com/rohban-lab/atd上找到。

Out-of-distribution (OOD) detection has recently gained substantial attention due to the importance of identifying out-of-domain samples in reliability and safety. Although OOD detection methods have advanced by a great deal, they are still susceptible to adversarial examples, which is a violation of their purpose. To mitigate this issue, several defenses have recently been proposed. Nevertheless, these efforts remained ineffective, as their evaluations are based on either small perturbation sizes, or weak attacks. In this work, we re-examine these defenses against an end-to-end PGD attack on in/out data with larger perturbation sizes, e.g. up to commonly used $ε=8/255$ for the CIFAR-10 dataset. Surprisingly, almost all of these defenses perform worse than a random detection under the adversarial setting. Next, we aim to provide a robust OOD detection method. In an ideal defense, the training should expose the model to almost all possible adversarial perturbations, which can be achieved through adversarial training. That is, such training perturbations should based on both in- and out-of-distribution samples. Therefore, unlike OOD detection in the standard setting, access to OOD, as well as in-distribution, samples sounds necessary in the adversarial training setup. These tips lead us to adopt generative OOD detection methods, such as OpenGAN, as a baseline. We subsequently propose the Adversarially Trained Discriminator (ATD), which utilizes a pre-trained robust model to extract robust features, and a generator model to create OOD samples. Using ATD with CIFAR-10 and CIFAR-100 as the in-distribution data, we could significantly outperform all previous methods in the robust AUROC while maintaining high standard AUROC and classification accuracy. The code repository is available at https://github.com/rohban-lab/ATD .

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源