通过自回归预测编码的文本依赖扬声器验证的人声道长度扰动

论文标题

通过自回归预测编码的文本依赖扬声器验证的人声道长度扰动

Vocal Tract Length Perturbation for Text-Dependent Speaker Verification with Autoregressive Prediction Coding

论文作者

Sarkar, Achintya kr., Tan, Zheng-Hua

论文摘要

在这封信中，我们提出了用于文本依赖性扬声器验证（TD-SV）的人声长度（VTL）摄动方法，其中训练了一组TD-SV系统，每个VTL因子一个，并应用得分级融合来做出最终决定。接下来，我们探索通过训练深度监管的目标，自回归预测编码（APC）提取的瓶颈（BN）功能，用于TD-SV，并将其与良好的扬声器辩论者dister-dister-dister-dister-dister-dister-dister-dister-dister-dister-dister-distudivister-distuder-distude Temuts进行了比较。然后将所提出的VTL方法应用于APC和扬声器 - 歧义BN特征。最后，我们组合了在MFCC上训练的VTL扰动系统和得分域中的两个BN功能。实验是在Reddots Challenge挑战2016数据库中使用高斯混合模型 - 通用背景模型和I-Vector技术的TD-SV数据库进行的。结果表明，所提出的方法显着优于基准。

In this letter, we propose a vocal tract length (VTL) perturbation method for text-dependent speaker verification (TD-SV), in which a set of TD-SV systems are trained, one for each VTL factor, and score-level fusion is applied to make a final decision. Next, we explore the bottleneck (BN) feature extracted by training deep neural networks with a self-supervised objective, autoregressive predictive coding (APC), for TD-SV and compare it with the well-studied speaker-discriminant BN feature. The proposed VTL method is then applied to APC and speaker-discriminant BN features. In the end, we combine the VTL perturbation systems trained on MFCC and the two BN features in the score domain. Experiments are performed on the RedDots challenge 2016 database of TD-SV using short utterances with Gaussian mixture model-universal background model and i-vector techniques. Results show the proposed methods significantly outperform the baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题