SRE19 CTS挑战的LEAP系统 - 改进和错误分析

论文标题

SRE19 CTS挑战的LEAP系统 - 改进和错误分析

LEAP System for SRE19 CTS Challenge -- Improvements and Error Analysis

论文作者

Ramoji, Shreyas, Krishnan, Prashant, Mysore, Bhargavram, Singh, Prachi, Ganapathy, Sriram

论文摘要

NIST说话者识别评估 - 2019年对话电话演讲（CTS）挑战是对挑战性条件下说话者验证的任务的公开评估。在本文中，我们提供了提交给CTS挑战的LEAP SRE系统的详细说明，重点是后端系统建模中的新组件。所有使用时间延迟神经网络（TDNN）X-vector嵌入的系统。我们的SRE19提交中的X矢量系统使用了大量的培训扬声器（大约14K扬声器）。在X矢量提取后，我们探索了一种神经网络方法来后端分数计算，该方法已针对说话者验证成本进行了优化。生成和神经PLDA模型的系统组合为SRE评估数据集提供了重大改进。我们还基于得分归一化和校准发现了SRE系统的额外收益。在评估之后，我们对提交系统进行了详细的分析。分析揭示了不同训练数据集组合以及建模方法获得的增量收益。

The NIST Speaker Recognition Evaluation - Conversational Telephone Speech (CTS) challenge 2019 was an open evaluation for the task of speaker verification in challenging conditions. In this paper, we provide a detailed account of the LEAP SRE system submitted to the CTS challenge focusing on the novel components in the back-end system modeling. All the systems used the time-delay neural network (TDNN) based x-vector embeddings. The x-vector system in our SRE19 submission used a large pool of training speakers (about 14k speakers). Following the x-vector extraction, we explored a neural network approach to backend score computation that was optimized for a speaker verification cost. The system combination of generative and neural PLDA models resulted in significant improvements for the SRE evaluation dataset. We also found additional gains for the SRE systems based on score normalization and calibration. Subsequent to the evaluations, we have performed a detailed analysis of the submitted systems. The analysis revealed the incremental gains obtained for different training dataset combinations as well as the modeling methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题