用于自动扬声器验证的数据驱动过滤库的优化

论文标题

用于自动扬声器验证的数据驱动过滤库的优化

Optimization of data-driven filterbank for automatic speaker verification

论文作者

Sarangi, Susanta, Sahidullah, Md, Saha, Goutam

论文摘要

大多数语音处理应用程序都使用在MEL尺度上间隔的三角形过滤器进行特征提取。在本文中，我们提出了一种新的数据驱动滤波器设计方法，该方法从给定的语音数据中优化了过滤器参数。首先，我们引入了一种基于框架选择的方法，用于开发基于语音信号的频率翘曲量表。然后，我们提出了一种使用主组件分析（PCA）来计算滤波器频率响应的新方法。该方法比最近引入的基于深度学习的方法的主要优点是，它需要非常有限的未标记的语音数据。我们证明，所提出的过滤库具有比常用的MEL FilterBank以及现有数据驱动的滤镜库的扬声器判别能力更多。我们使用各种分类器后端对不同语料库进行自动扬声器验证（ASV）实验。我们表明，在大多数情况下，使用建议的滤纸库创建的声学比现有的MEL频率Cepstral系数（MFCC）和基于语音信号的频率频率CEPSTRAL系数（SFCC）更好。在使用Voxceleb1和流行的I-vector后端实验中，我们观察到比MFCC相等的错误率（EER）相对提高了9.75％。同样，最近引入X-Vector系统的相对改进为4.43％。使用基于标准的MFCC方法的拟议方法融合，我们获得了进一步的改进。

Most of the speech processing applications use triangular filters spaced in mel-scale for feature extraction. In this paper, we propose a new data-driven filter design method which optimizes filter parameters from a given speech data. First, we introduce a frame-selection based approach for developing speech-signal-based frequency warping scale. Then, we propose a new method for computing the filter frequency responses by using principal component analysis (PCA). The main advantage of the proposed method over the recently introduced deep learning based methods is that it requires very limited amount of unlabeled speech-data. We demonstrate that the proposed filterbank has more speaker discriminative power than commonly used mel filterbank as well as existing data-driven filterbank. We conduct automatic speaker verification (ASV) experiments with different corpora using various classifier back-ends. We show that the acoustic features created with proposed filterbank are better than existing mel-frequency cepstral coefficients (MFCCs) and speech-signal-based frequency cepstral coefficients (SFCCs) in most cases. In the experiments with VoxCeleb1 and popular i-vector back-end, we observe 9.75% relative improvement in equal error rate (EER) over MFCCs. Similarly, the relative improvement is 4.43% with recently introduced x-vector system. We obtain further improvement using fusion of the proposed method with standard MFCC-based approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题