论文标题
通过一致性正则化改善神经网络的认证鲁棒性
Improving the Certified Robustness of Neural Networks via Consistency Regularization
论文作者
论文摘要
已经提出了一系列的防御方法来改善神经网络在对抗性例子上的鲁棒性,其中已证明的防御方法被证明可以有效地训练对攻击者有证实可靠的神经网络。但是,这些可证明的防御方法中的大多数在训练过程中平均对所有例子进行处理,这些方法忽略了正确分类(自然)和错误分类的示例之间认证鲁棒性的不一致限制。在本文中,我们探讨了由于错误分类的示例引起的这种不一致性,并添加了一个新颖的一致性正规化术语,以更好地利用错误分类的示例。具体而言,我们确定,如果在错误分类的示例中认证鲁棒性的限制和正确分类的示例是一致的,则可以显着提高网络认证的鲁棒性。在这一发现的激励下,我们设计了一个新的防御正规化术语,称为错误分类识别对抗正规化(MAAR),该术语限制了错误分类示例的认证区域中所有示例的输出概率分布。实验结果表明,与几种最先进的方法相比,我们提出的MAAR在CIFAR-10和MNIST数据集上实现了最佳认证的鲁棒性和可比的精度。
A range of defense methods have been proposed to improve the robustness of neural networks on adversarial examples, among which provable defense methods have been demonstrated to be effective to train neural networks that are certifiably robust to the attacker. However, most of these provable defense methods treat all examples equally during training process, which ignore the inconsistent constraint of certified robustness between correctly classified (natural) and misclassified examples. In this paper, we explore this inconsistency caused by misclassified examples and add a novel consistency regularization term to make better use of the misclassified examples. Specifically, we identified that the certified robustness of network can be significantly improved if the constraint of certified robustness on misclassified examples and correctly classified examples is consistent. Motivated by this discovery, we design a new defense regularization term called Misclassification Aware Adversarial Regularization (MAAR), which constrains the output probability distributions of all examples in the certified region of the misclassified example. Experimental results show that our proposed MAAR achieves the best certified robustness and comparable accuracy on CIFAR-10 and MNIST datasets in comparison with several state-of-the-art methods.