论文标题

X-Vector序列(VBX)在扬声器诊断中的贝叶斯HMM聚类:对标准任务的理论,实施和分析

Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks

论文作者

Landini, Federico, Profant, Ján, Diez, Mireia, Burget, Lukáš

论文摘要

最近提出的VBX诊断方法使用贝叶斯隐藏的Markov模型以一系列X矢量找到扬声器簇。在这项工作中,我们对VBX诊断的性能与文献中的其他方法进行了广泛的比较,我们表明VBX在评估诊断的三个最受欢迎的数据集上实现了出色的性能:Callhome,AMI和Dihardii数据集。此外,我们首次提出了VBX模型的派生和更新公式,与以前且更复杂的BHMM模型相比,该模型的效率和简单性都在逐帧标准的Cepstral特征。与本出版物一起,我们发布了训练宽带和窄带数据实验中使用的X矢量提取器的配方,以及在所有三个数据集中都达到最先进的性能的VBX配方。此外,我们指出缺乏针对AMI数据集的标准化评估协议,并根据官方的AMI分区和转录提出了针对波束形成和混合头标音频的新协议。

The recently proposed VBx diarization method uses a Bayesian hidden Markov model to find speaker clusters in a sequence of x-vectors. In this work we perform an extensive comparison of performance of the VBx diarization with other approaches in the literature and we show that VBx achieves superior performance on three of the most popular datasets for evaluating diarization: CALLHOME, AMI and DIHARDII datasets. Further, we present for the first time the derivation and update formulae for the VBx model, focusing on the efficiency and simplicity of this model as compared to the previous and more complex BHMM model working on frame-by-frame standard Cepstral features. Together with this publication, we release the recipe for training the x-vector extractors used in our experiments on both wide and narrowband data, and the VBx recipes that attain state-of-the-art performance on all three datasets. Besides, we point out the lack of a standardized evaluation protocol for AMI dataset and we propose a new protocol for both Beamformed and Mix-Headset audios based on the official AMI partitions and transcriptions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源