独立于文本的扬声器验证的自适应X矢量模型

论文标题

独立于文本的扬声器验证的自适应X矢量模型

An Adaptive X-vector Model for Text-independent Speaker Verification

论文作者

Gu, Bin, Guo, Wu, Dai, Lirong, Du, Jun

论文摘要

在本文中，自适应机制用于基于X矢量的独立说话者验证的深神经网络（DNN）培训。首先，自适应卷积神经网络（ACNN）用于框架级嵌入层中，其中根据输入特征对卷积过滤器的参数进行调整。与传统的CNN相比，ACNN在捕获扬声器信息方面具有更大的灵活性。此外，我们用自适应批归归式（ABN）替换常规批准归一化（BN）。通过动态生成BN中的缩放和转移参数，ABN将模型适应源于源于通道和环境噪声等各种因素的声学变异性。最后，我们合并了这两种方法以进一步提高性能。实验是在野外（Sitw）和Voices数据库中的扬声器上进行的。结果表明，所提出的方法显着胜过原始X矢量方法。

In this paper, adaptive mechanisms are applied in deep neural network (DNN) training for x-vector-based text-independent speaker verification. First, adaptive convolutional neural networks (ACNNs) are employed in frame-level embedding layers, where the parameters of the convolution filters are adjusted based on the input features. Compared with conventional CNNs, ACNNs have more flexibility in capturing speaker information. Moreover, we replace conventional batch normalization (BN) with adaptive batch normalization (ABN). By dynamically generating the scaling and shifting parameters in BN, ABN adapts models to the acoustic variability arising from various factors such as channel and environmental noises. Finally, we incorporate these two methods to further improve performance. Experiments are carried out on the speaker in the wild (SITW) and VOiCES databases. The results demonstrate that the proposed methods significantly outperform the original x-vector approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题