用神经嵌入检测声带疲劳

论文标题

用神经嵌入检测声带疲劳

Detecting Vocal Fatigue with Neural Embeddings

论文作者

Bayerl, Sebastian P., Wagner, Dominik, Baumann, Ilja, Riedhammer, Korbinian, Bocklet, Tobias

论文摘要

人声疲劳是指由于扩展利用而引起的疲倦和声音虚弱的感觉。本文研究了神经嵌入对声音疲劳检测的有效性。我们比较X-矢量，ecapa-tdnn和wav2vec 2.0嵌入在学术英语语料库上。数据的低维映射表明，神经嵌入在长时间的语音使用过程中捕获有关说话者声音特征的变化的信息。我们表明，在将时间平滑和归一化应用于提取的嵌入时，可以使用所有三种神经嵌入后可以可靠地预测声疲劳。我们使用X-vectors使用支持向量机进行分类，并使用ECAPA-TDNN嵌入式实现81％的精度分数，使用wav2Vec 2.0嵌入式嵌入式ecapa-tdnn嵌入方式，将82％作为输入特征。当将训练的系统应用于其他扬声器和录制环境而没有任何适应性时，我们将获得76％的精度得分。

Vocal fatigue refers to the feeling of tiredness and weakness of voice due to extended utilization. This paper investigates the effectiveness of neural embeddings for the detection of vocal fatigue. We compare x-vectors, ECAPA-TDNN, and wav2vec 2.0 embeddings on a corpus of academic spoken English. Low-dimensional mappings of the data reveal that neural embeddings capture information about the change in vocal characteristics of a speaker during prolonged voice usage. We show that vocal fatigue can be reliably predicted using all three kinds of neural embeddings after only 50 minutes of continuous speaking when temporal smoothing and normalization are applied to the extracted embeddings. We employ support vector machines for classification and achieve accuracy scores of 81% using x-vectors, 85% using ECAPA-TDNN embeddings, and 82% using wav2vec 2.0 embeddings as input features. We obtain an accuracy score of 76%, when the trained system is applied to a different speaker and recording environment without any adaptation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题