论文标题
利用不同的学习方式来改善生物医学成像中的知识蒸馏
Leveraging Different Learning Styles for Improved Knowledge Distillation in Biomedical Imaging
论文作者
论文摘要
学习风格是指个人为获得新知识所采用的一种培训机制。正如Vark模型所建议的那样,人类具有不同的学习偏好,例如视觉(V),听觉(A),读/写(R)和Kinesthetic(K),用于获取并有效地处理信息。我们的工作努力利用这种知识多样化的概念来提高模型压缩技术的性能,例如知识蒸馏(KD)和相互学习(ML)。因此,我们在一个统一的框架中使用单一老师和两学生网络,不仅允许将知识从教师转移到学生(KD)(KD),还鼓励学生之间的协作学习(ML)。与传统方法不同,教师以预测形式或特征表示与学生网络共享相同的知识,我们提议的方法通过培训一个具有预测的学生,而另一位具有教师的功能图来采用更多元化的策略。我们通过促进两个学生网络之间的预测和特征地图的交换来进一步扩展这些知识的多元化,从而丰富了他们的学习经验。我们使用两个不同的网络体系结构组合进行了三个基准数据集进行了全面的实验,用于分类和分割任务。这些实验结果表明,合并的KD和ML框架中的知识多样性优于常规KD或ML技术(具有类似的网络配置),该技术仅使用预测,平均改善2%。此外,通过各种网络体系结构以及对最先进的技术的性能持续提高,建立了拟议模型的鲁棒性和概括性
Learning style refers to a type of training mechanism adopted by an individual to gain new knowledge. As suggested by the VARK model, humans have different learning preferences, like Visual (V), Auditory (A), Read/Write (R), and Kinesthetic (K), for acquiring and effectively processing information. Our work endeavors to leverage this concept of knowledge diversification to improve the performance of model compression techniques like Knowledge Distillation (KD) and Mutual Learning (ML). Consequently, we use a single-teacher and two-student network in a unified framework that not only allows for the transfer of knowledge from teacher to students (KD) but also encourages collaborative learning between students (ML). Unlike the conventional approach, where the teacher shares the same knowledge in the form of predictions or feature representations with the student network, our proposed approach employs a more diversified strategy by training one student with predictions and the other with feature maps from the teacher. We further extend this knowledge diversification by facilitating the exchange of predictions and feature maps between the two student networks, enriching their learning experiences. We have conducted comprehensive experiments with three benchmark datasets for both classification and segmentation tasks using two different network architecture combinations. These experimental results demonstrate that knowledge diversification in a combined KD and ML framework outperforms conventional KD or ML techniques (with similar network configuration) that only use predictions with an average improvement of 2%. Furthermore, consistent improvement in performance across different tasks, with various network architectures, and over state-of-the-art techniques establishes the robustness and generalizability of the proposed model