跨层蒸馏和语义校准

论文标题

跨层蒸馏和语义校准

Cross-Layer Distillation with Semantic Calibration

论文作者

Chen, Defang, Mei, Jian-Ping, Zhang, Yuan, Wang, Can, Feng, Yan, Chen, Chun

论文摘要

知识蒸馏是一种通过从教师模型中利用输出来增强学生模型的概括能力的技术。最近，基于特征图的变体探索了在中间层中手动指定的教师对成对之间的知识转移，以进一步改进。但是，层的语义在不同的神经网络中可能有所不同，并且在手动层关联中的语义不匹配将导致由于正规化而导致性能变性。为了解决这个问题，我们提出了用于跨层知识蒸馏（SEMCKD）的语义校准，该校准会自动通过注意机制为每个学生层分配教师模型的适当目标层。通过学习的注意力分布，每个学生层都会提炼多个教师层中包含的知识，而不是特定的中间层，以进行适当的跨层监督。我们进一步提供了关联权重的理论分析，并进行了广泛的实验，以证明我们方法的有效性。代码可在\ url {https://github.com/defangchen/semckd}中提供可用。

Knowledge distillation is a technique to enhance the generalization ability of a student model by exploiting outputs from a teacher model. Recently, feature-map based variants explore knowledge transfer between manually assigned teacher-student pairs in intermediate layers for further improvement. However, layer semantics may vary in different neural networks and semantic mismatch in manual layer associations will lead to performance degeneration due to negative regularization. To address this issue, we propose Semantic Calibration for cross-layer Knowledge Distillation (SemCKD), which automatically assigns proper target layers of the teacher model for each student layer with an attention mechanism. With a learned attention distribution, each student layer distills knowledge contained in multiple teacher layers rather than a specific intermediate layer for appropriate cross-layer supervision. We further provide theoretical analysis of the association weights and conduct extensive experiments to demonstrate the effectiveness of our approach. Code is avaliable at \url{https://github.com/DefangChen/SemCKD}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题