论文标题
从异质非标记收藏中蒸馏
Distillation from heterogeneous unlabeled collections
论文作者
论文摘要
压缩深网对于将其应用程序范围扩展到受限的设置至关重要。然而,在模型训练之后很长时间,当原始数据可能不再可用时,经常出现压缩的需求。另一方面,不一定与目标任务相关的未标记数据通常很丰富,尤其是在图像分类任务中。在这项工作中,我们提出了一个计划,以利用此类样本来提炼大型教师网络所学的知识为较小的学生。所提出的技术依赖于(i)优先采样相关的数据点,以及(ii)更好地利用学习信号。我们表明,前者加快了学生的融合,而后者则提高了其表现,从而实现了与原始数据可以预期的表演。
Compressing deep networks is essential to expand their range of applications to constrained settings. The need for compression however often arises long after the model was trained, when the original data might no longer be available. On the other hand, unlabeled data, not necessarily related to the target task, is usually plentiful, especially in image classification tasks. In this work, we propose a scheme to leverage such samples to distill the knowledge learned by a large teacher network to a smaller student. The proposed technique relies on (i) preferentially sampling datapoints that appear related, and (ii) taking better advantage of the learning signal. We show that the former speeds up the student's convergence, while the latter boosts its performance, achieving performances closed to what can be expected with the original data.