具有信息最大化和对比度学习的标签有效的自我监督的扬声器验证

论文标题

具有信息最大化和对比度学习的标签有效的自我监督的扬声器验证

Label-Efficient Self-Supervised Speaker Verification With Information Maximization and Contrastive Learning

论文作者

Lepage, Théo, Dehak, Réda

论文摘要

最先进的说话者验证系统本质上取决于某种人类监督，因为它们接受了大量标记数据的培训。但是，手动注释的话语缓慢，昂贵，无法扩展到当今可用的数据量。在这项研究中，我们通过直接从原始音频中学习表征来探索说话者验证的自我监督学习。目的是生成具有较小的言论扬声器和较大言论扬声器差异的稳健扬声器嵌入。我们的方法基于最新信息最大化学习框架和密集的数据增强预处理步骤。我们在表明它们与对比度损失结合在一起之前，在没有对比样本的情况下进行无需样本的情况下工作的能力就可以评估它们的能力。此外，我们进行实验表明，与现有技术相比，我们的方法达到了竞争成果，并且在用一小部分标记数据进行微调时，与监督基线相比，可以获得更好的性能。

State-of-the-art speaker verification systems are inherently dependent on some kind of human supervision as they are trained on massive amounts of labeled data. However, manually annotating utterances is slow, expensive and not scalable to the amount of data available today. In this study, we explore self-supervised learning for speaker verification by learning representations directly from raw audio. The objective is to produce robust speaker embeddings that have small intra-speaker and large inter-speaker variance. Our approach is based on recent information maximization learning frameworks and an intensive data augmentation pre-processing step. We evaluate the ability of these methods to work without contrastive samples before showing that they achieve better performance when combined with a contrastive loss. Furthermore, we conduct experiments to show that our method reaches competitive results compared to existing techniques and can get better performances compared to a supervised baseline when fine-tuned with a small portion of labeled data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题