黑盒少量知识蒸馏

论文标题

黑盒少量知识蒸馏

Black-box Few-shot Knowledge Distillation

论文作者

Nguyen, Dang, Gupta, Sunil, Do, Kien, Venkatesh, Svetha

论文摘要

知识蒸馏（KD）是一种有效的方法，可以将知识从大型“教师”网络转移到较小的“学生”网络。传统的KD方法需要大量标记的培训样本和白盒老师（可以访问参数）才能培训好学生。但是，这些资源在现实世界应用中并不总是可用的。蒸馏过程通常发生在我们无法访问大量数据的外部政党方面，并且由于安全性和隐私问题，教师没有披露其参数。为了克服这些挑战，我们提出了一种黑盒子少的KD方法，以培训学生很少的未标记培训样本和一个黑盒老师。我们的主要思想是通过使用混合和有条件的变异自动编码器生成一组不同的分布合成图像来扩展训练集。这些合成图像及其从老师获得的标签用于培训学生。我们进行了广泛的实验，以表明我们的方法在图像分类任务上明显优于最近的SOTA/零射击方法。代码和型号可在以下网址找到：https：//github.com/nphdang/fs-bbt

Knowledge distillation (KD) is an efficient approach to transfer the knowledge from a large "teacher" network to a smaller "student" network. Traditional KD methods require lots of labeled training samples and a white-box teacher (parameters are accessible) to train a good student. However, these resources are not always available in real-world applications. The distillation process often happens at an external party side where we do not have access to much data, and the teacher does not disclose its parameters due to security and privacy concerns. To overcome these challenges, we propose a black-box few-shot KD method to train the student with few unlabeled training samples and a black-box teacher. Our main idea is to expand the training set by generating a diverse set of out-of-distribution synthetic images using MixUp and a conditional variational auto-encoder. These synthetic images along with their labels obtained from the teacher are used to train the student. We conduct extensive experiments to show that our method significantly outperforms recent SOTA few/zero-shot KD methods on image classification tasks. The code and models are available at: https://github.com/nphdang/FS-BBT

下载PDF全文

下载文献需遵守相关版权规定

论文标题