TDCGAN：临时卷积生成对抗网络，用于端到端语音增强

论文标题

TDCGAN：临时卷积生成对抗网络，用于端到端语音增强

Tdcgan: Temporal Dilated Convolutional Generative Adversarial Network for End-to-end Speech Enhancement

论文作者

Ye, Shuaishuai, Hu, Xinhui, Xu, Xinkang

论文摘要

在本文中，为了进一步处理忽略常规语音增强系统中的阶段信息引起的性能降解，我们提出了在基于端到端的语音增强体系结构中的时间扩张卷积卷积生成对抗网络（TDCGAN）。我们第一次将具有深度可分离卷积的时间扩张卷积网络引入了GAN结构中，以便可以大大增加接收场而不增加参数的数量。我们还首先探讨了信噪比（SNR）惩罚项的效果，因为发电机的损耗函数对改善增强语音的SNR的损失函数的正则化。实验结果表明，我们提出的方法的表现优于最先进的基于GAN的语音增强。此外，与以前的基于GAN的方法相比，所提出的TDCGAN可以大大减少参数的数量。正如预期的那样，这项工作还表明，作为正则化的SNR惩罚项目比改善增强语音的SNR更有效。

In this paper, in order to further deal with the performance degradation caused by ignoring the phase information in conventional speech enhancement systems, we proposed a temporal dilated convolutional generative adversarial network (TDCGAN) in the end-to-end based speech enhancement architecture. For the first time, we introduced the temporal dilated convolutional network with depthwise separable convolutions into the GAN structure so that the receptive field can be greatly increased without increasing the number of parameters. We also first explored the effect of signal-to-noise ratio (SNR) penalty item as regularization of the loss function of generator on improving the SNR of enhanced speech. The experimental results demonstrated that our proposed method outperformed the state-of-the-art end-to-end GAN-based speech enhancement. Moreover, compared with previous GAN-based methods, the proposed TDCGAN could greatly decreased the number of parameters. As expected, the work also demonstrated that the SNR penalty item as regularization was more effective than $L1$ on improving the SNR of enhanced speech.

下载PDF全文

下载文献需遵守相关版权规定

论文标题