SpeakErnet：1D深度可分开的卷积网络，用于文本独立的说话者识别和验证

论文标题

SpeakErnet：1D深度可分开的卷积网络，用于文本独立的说话者识别和验证

SpeakerNet: 1D Depth-wise Separable Convolutional Network for Text-Independent Speaker Recognition and Verification

论文作者

Koluguri, Nithin Rao, Li, Jason, Lavrukhin, Vitaly, Ginsburg, Boris

论文摘要

我们建议Speakernet-一种新的神经架构，用于演讲者识别和演讲者验证任务。它由具有一维深度分离卷积，批量规范化和relu层的残留块组成。该体系结构使用基于X矢量的统计池池来将可变长度的话语映射到固定长度嵌入（Q-vector）。 Speakernet-M是一个简单的轻量级模型，只有5m参数。它不使用语音活动检测（VAD），并且在清理voxceleb1上获得的错误率（EER）为2.10％，在Voxcceleb1试用文件上获得了2.10％的评分。

We propose SpeakerNet - a new neural architecture for speaker recognition and speaker verification tasks. It is composed of residual blocks with 1D depth-wise separable convolutions, batch-normalization, and ReLU layers. This architecture uses x-vector based statistics pooling layer to map variable-length utterances to a fixed-length embedding (q-vector). SpeakerNet-M is a simple lightweight model with just 5M parameters. It doesn't use voice activity detection (VAD) and achieves close to state-of-the-art performance scoring an Equal Error Rate (EER) of 2.10% on the VoxCeleb1 cleaned and 2.29% on the VoxCeleb1 trial files.

下载PDF全文

下载文献需遵守相关版权规定

论文标题