论文标题
Ardir:使用内部表示的知识蒸馏改善鲁棒性
ARDIR: Improving Robustness using Knowledge Distillation of Internal Representation
论文作者
论文摘要
对抗性训练是针对对抗性例子学习强大模型的最有前途的方法。最近的一项研究表明,相同体系结构之间的知识蒸馏有效地改善了对抗训练的性能。利用知识蒸馏是一种改善对抗性训练的新方法,并引起了很多关注。但是,其性能仍然不足。因此,我们建议使用内部表示〜(ardir)更有效地利用知识蒸馏的对抗性鲁棒蒸馏。除了教师模型的产出外,Ardir还使用教师模型的内部表示作为对抗训练的标签。这使学生模型能够接受更丰富,更有信息的标签培训。结果,Ardir可以学习更多健壮的学生模型。我们表明,Ardir在我们的实验中的表现优于以前的方法。
Adversarial training is the most promising method for learning robust models against adversarial examples. A recent study has shown that knowledge distillation between the same architectures is effective in improving the performance of adversarial training. Exploiting knowledge distillation is a new approach to improve adversarial training and has attracted much attention. However, its performance is still insufficient. Therefore, we propose Adversarial Robust Distillation with Internal Representation~(ARDIR) to utilize knowledge distillation even more effectively. In addition to the output of the teacher model, ARDIR uses the internal representation of the teacher model as a label for adversarial training. This enables the student model to be trained with richer, more informative labels. As a result, ARDIR can learn more robust student models. We show that ARDIR outperforms previous methods in our experiments.