论文标题
攻击敏捷的对抗检测
Attack-Agnostic Adversarial Detection
论文作者
论文摘要
近年来,越来越多的对抗攻击使攻击者比防守者具有优势,因为后卫必须在知道攻击的类型后训练探测器,并且需要维持许多模型,以确保在检测任何即将到来的攻击方面的良好表现。我们提出了一种方法,通过将对抗性攻击检测视为一个异常检测问题,以结束攻击者和防守者之间的拔河比赛,以使检测器对攻击不可知。我们量化了两个方面的对抗扰动引起的统计偏差。最不重要的成分特征(LSCF)量化了对抗性示例与良性样品和黑森特征(HF)统计的偏差,这反映了对抗性示例如何通过测量局部损耗曲率来扭曲模型的最佳景观。经验结果表明,我们的方法可以分别达到CIFAR10,CIFAR100和SVHN的总体ROC AUC,分别为94.9%,89.7%和94.6%,并且在大多数攻击中接受对抗性示例的对抗性探测器具有可比性的性能。
The growing number of adversarial attacks in recent years gives attackers an advantage over defenders, as defenders must train detectors after knowing the types of attacks, and many models need to be maintained to ensure good performance in detecting any upcoming attacks. We propose a way to end the tug-of-war between attackers and defenders by treating adversarial attack detection as an anomaly detection problem so that the detector is agnostic to the attack. We quantify the statistical deviation caused by adversarial perturbations in two aspects. The Least Significant Component Feature (LSCF) quantifies the deviation of adversarial examples from the statistics of benign samples and Hessian Feature (HF) reflects how adversarial examples distort the landscape of the model's optima by measuring the local loss curvature. Empirical results show that our method can achieve an overall ROC AUC of 94.9%, 89.7%, and 94.6% on CIFAR10, CIFAR100, and SVHN, respectively, and has comparable performance to adversarial detectors trained with adversarial examples on most of the attacks.