条件扩散概率模型用于语音增强

论文标题

条件扩散概率模型用于语音增强

Conditional Diffusion Probabilistic Model for Speech Enhancement

论文作者

Lu, Yen-Ju, Wang, Zhong-Qiu, Watanabe, Shinji, Richard, Alexander, Yu, Cheng, Tsao, Yu

论文摘要

语音增强是许多面向用户的音频应用程序的关键组成部分，但是当前系统仍然遭受扭曲和不自然的输出。尽管生成模型在语音综合方面表现出强大的潜力，但它们仍在语音增强方面落后。这项工作利用了扩散概率模型的最新进展，并提出了一种新型的语音增强算法，该算法将观察到的嘈杂语音信号的特征纳入扩散和反向过程。更具体地说，我们提出了一种称为条件扩散概率模型的扩散概率模型的广义公式，该模型在其反向过程中可以适应估计的语音信号中的非高斯真实噪声。在我们的实验中，我们证明了与代表性生成模型相比，提出的方法的强劲性能，并研究了模型对训练期间噪声特征的其他数据集的概括能力。

Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs. While generative models have shown strong potential in speech synthesis, they are still lagging behind in speech enhancement. This work leverages recent advances in diffusion probabilistic models, and proposes a novel speech enhancement algorithm that incorporates characteristics of the observed noisy speech signal into the diffusion and reverse processes. More specifically, we propose a generalized formulation of the diffusion probabilistic model named conditional diffusion probabilistic model that, in its reverse process, can adapt to non-Gaussian real noises in the estimated speech signal. In our experiments, we demonstrate strong performance of the proposed approach compared to representative generative models, and investigate the generalization capability of our models to other datasets with noise characteristics unseen during training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题