Ardir：使用内部表示的知识蒸馏改善鲁棒性

论文标题

Ardir：使用内部表示的知识蒸馏改善鲁棒性

ARDIR: Improving Robustness using Knowledge Distillation of Internal Representation

论文作者

Takahashi, Tomokatsu, Yamada, Masanori, Yamanaka, Yuuki, Yamashita, Tomoya

论文摘要

对抗性训练是针对对抗性例子学习强大模型的最有前途的方法。最近的一项研究表明，相同体系结构之间的知识蒸馏有效地改善了对抗训练的性能。利用知识蒸馏是一种改善对抗性训练的新方法，并引起了很多关注。但是，其性能仍然不足。因此，我们建议使用内部表示〜（ardir）更有效地利用知识蒸馏的对抗性鲁棒蒸馏。除了教师模型的产出外，Ardir还使用教师模型的内部表示作为对抗训练的标签。这使学生模型能够接受更丰富，更有信息的标签培训。结果，Ardir可以学习更多健壮的学生模型。我们表明，Ardir在我们的实验中的表现优于以前的方法。

Adversarial training is the most promising method for learning robust models against adversarial examples. A recent study has shown that knowledge distillation between the same architectures is effective in improving the performance of adversarial training. Exploiting knowledge distillation is a new approach to improve adversarial training and has attracted much attention. However, its performance is still insufficient. Therefore, we propose Adversarial Robust Distillation with Internal Representation~(ARDIR) to utilize knowledge distillation even more effectively. In addition to the output of the teacher model, ARDIR uses the internal representation of the teacher model as a label for adversarial training. This enables the student model to be trained with richer, more informative labels. As a result, ARDIR can learn more robust student models. We show that ARDIR outperforms previous methods in our experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题