论文标题
一项关于将关节运动信息纳入语音增强的研究
A Study of Incorporating Articulatory Movement Information in Speech Enhancement
论文作者
论文摘要
尽管深度学习算法被广泛用于改善语音增强(SE)的性能,但在高度挑战性的条件下,该性能仍然有限,例如噪声或噪声信号的噪声或噪声信号较低(SNRS)。这项研究提供了对新型的多模式音频动态SE(AAMSE)模型的试验研究,以在此类具有挑战性的条件下增强SE性能。具有三种融合策略的基于波形映射和基于光谱映射的SE系统的发音运动特征和声学信号用作输入。此外,进行了一项消融研究,以使用有限数量的关节运动传感器来评估SE性能。实验结果证实,与传统的纯音频基线相比,通过结合模式,AAMSE模型在语音质量和清晰度方面显着提高了SE的性能。
Although deep learning algorithms are widely used for improving speech enhancement (SE) performance, the performance remains limited under highly challenging conditions, such as unseen noise or noise signals having low signal-to-noise ratios (SNRs). This study provides a pilot investigation on a novel multimodal audio-articulatory-movement SE (AAMSE) model to enhance SE performance under such challenging conditions. Articulatory movement features and acoustic signals were used as inputs to waveform-mapping-based and spectral-mapping-based SE systems with three fusion strategies. In addition, an ablation study was conducted to evaluate SE performance using a limited number of articulatory movement sensors. Experimental results confirm that, by combining the modalities, the AAMSE model notably improves the SE performance in terms of speech quality and intelligibility, as compared to conventional audio-only SE baselines.