论文标题
演讲者认可的来源:Voxsrc 2020的教训
The ins and outs of speaker recognition: lessons from VoxSRC 2020
论文作者
论文摘要
Interspeech 2020年的Voxceleb扬声器识别挑战(VOXSRC)为扬声器识别系统提供了具有挑战性的评估,其中包括名人在电影中扮演不同的部分。这项工作的目的是说话者对这些挑战性环境中记录的话语的强烈认可。我们利用流行的Resnet体系结构的变体来用于说话者识别,并使用一系列损失功能和训练参数进行广泛的实验。为此,我们优化了一个有效的培训框架,该框架允许有限的时间和资源对强大的模型进行培训。我们训练有素的模型显示了对大多数现有作品的改进,该作品具有更轻的模型和简单的管道。本文分享了从我们参与挑战中学到的教训。
The VoxCeleb Speaker Recognition Challenge (VoxSRC) at Interspeech 2020 offers a challenging evaluation for speaker recognition systems, which includes celebrities playing different parts in movies. The goal of this work is robust speaker recognition of utterances recorded in these challenging environments. We utilise variants of the popular ResNet architecture for speaker recognition and perform extensive experiments using a range of loss functions and training parameters. To this end, we optimise an efficient training framework that allows powerful models to be trained with limited time and resources. Our trained models demonstrate improvements over most existing works with lighter models and a simple pipeline. The paper shares the lessons learned from our participation in the challenge.