论文标题
可解释的神经元嵌入静态知识蒸馏
An Interpretable Neuron Embedding for Static Knowledge Distillation
论文作者
论文摘要
尽管深层神经网络在各种任务中表现出良好的绩效,但始终批评模型的不良解释性。在本文中,我们提出了一种新的可解释的神经网络方法,将神经元嵌入语义空间中以提取其内在的全球语义。与以前探测模型内潜在知识的方法相反,所提出的语义向量将潜在知识与静态知识进行外部外观,这很容易利用。具体而言,我们假设具有相似激活的神经元具有相似的语义信息。之后,通过在神经网络训练过程中连续排列激活相似性和语义向量相似性来优化语义向量。语义向量的可视化允许对神经网络的定性解释。此外,我们通过知识蒸馏任务定量评估静态知识。可视化的经验实验表明,语义向量很好地描述了神经元激活语义。如果没有教师模型的样本指导,静态知识蒸馏与现有基于关系的知识蒸馏方法表现出可比甚至优越的性能。
Although deep neural networks have shown well-performance in various tasks, the poor interpretability of the models is always criticized. In the paper, we propose a new interpretable neural network method, by embedding neurons into the semantic space to extract their intrinsic global semantics. In contrast to previous methods that probe latent knowledge inside the model, the proposed semantic vector externalizes the latent knowledge to static knowledge, which is easy to exploit. Specifically, we assume that neurons with similar activation are of similar semantic information. Afterwards, semantic vectors are optimized by continuously aligning activation similarity and semantic vector similarity during the training of the neural network. The visualization of semantic vectors allows for a qualitative explanation of the neural network. Moreover, we assess the static knowledge quantitatively by knowledge distillation tasks. Empirical experiments of visualization show that semantic vectors describe neuron activation semantics well. Without the sample-by-sample guidance from the teacher model, static knowledge distillation exhibit comparable or even superior performance with existing relation-based knowledge distillation methods.