论文标题

SpeakErnet:1D深度可分开的卷积网络,用于文本独立的说话者识别和验证

SpeakerNet: 1D Depth-wise Separable Convolutional Network for Text-Independent Speaker Recognition and Verification

论文作者

Koluguri, Nithin Rao, Li, Jason, Lavrukhin, Vitaly, Ginsburg, Boris

论文摘要

我们建议Speakernet-一种新的神经架构,用于演讲者识别和演讲者验证任务。它由具有一维深度分离卷积,批量规范化和relu层的残留块组成。该体系结构使用基于X矢量的统计池池来将可变长度的话语映射到固定长度嵌入(Q-vector)。 Speakernet-M是一个简单的轻量级模型,只有5m参数。它不使用语音活动检测(VAD),并且在清理voxceleb1上获得的错误率(EER)为2.10%,在Voxcceleb1试用文件上获得了2.10%的评分。

We propose SpeakerNet - a new neural architecture for speaker recognition and speaker verification tasks. It is composed of residual blocks with 1D depth-wise separable convolutions, batch-normalization, and ReLU layers. This architecture uses x-vector based statistics pooling layer to map variable-length utterances to a fixed-length embedding (q-vector). SpeakerNet-M is a simple lightweight model with just 5M parameters. It doesn't use voice activity detection (VAD) and achieves close to state-of-the-art performance scoring an Equal Error Rate (EER) of 2.10% on the VoxCeleb1 cleaned and 2.29% on the VoxCeleb1 trial files.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源