轻巧扬声器识别的无标签知识蒸馏和对比度损失

论文标题

轻巧扬声器识别的无标签知识蒸馏和对比度损失

Label-free Knowledge Distillation with Contrastive Loss for Light-weight Speaker Recognition

论文作者

Peng, Zhiyuan, He, Xuanji, Ding, Ke, Lee, Tan, Wan, Guanglu

论文摘要

在最近的研究中，非常深入的说话者识别模型（SR）表现出显着的性能提高。但是，将这些模型部署到具有约束的计算资源的设备应用程序是不切实际的。另一方面，尽管表现出色，但在实践中，轻型模型还是高度期望的。这项研究旨在通过大规模的无标签知识蒸馏（KD）来改善轻型SR模型。现有的SR的KD方法通常需要扬声器标签来学习特定于任务的知识，这是因为蒸馏的常规损失效率低下。为了解决效率低下的问题并达到无标签的KD，我们建议利用自我监管学习的对比损失进行蒸馏。从不同来源的公共语音数据集集合进行了广泛的实验。轻巧的SR模型的结果表明，具有对比损失的无标签KD的建议方法始终优于常规蒸馏方法和自我监督的学习方法，并具有很大的余量。

Very deep models for speaker recognition (SR) have demonstrated remarkable performance improvement in recent research. However, it is impractical to deploy these models for on-device applications with constrained computational resources. On the other hand, light-weight models are highly desired in practice despite their sub-optimal performance. This research aims to improve light-weight SR models through large-scale label-free knowledge distillation (KD). Existing KD approaches for SR typically require speaker labels to learn task-specific knowledge, due to the inefficiency of conventional loss for distillation. To address the inefficiency problem and achieve label-free KD, we propose to employ the contrastive loss from self-supervised learning for distillation. Extensive experiments are conducted on a collection of public speech datasets from diverse sources. Results on light-weight SR models show that the proposed approach of label-free KD with contrastive loss consistently outperforms both conventional distillation methods and self-supervised learning methods by a significant margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题