密集的CNN具有自我注意力，以增强时间域语音

论文标题

密集的CNN具有自我注意力，以增强时间域语音

Dense CNN with Self-Attention for Time-Domain Speech Enhancement

论文作者

Pandey, Ashutosh, Wang, DeLiang

论文摘要

近年来，由于它可以共同增强语音的幅度和阶段，因此近年来，时间领域的语音增强变得越来越流行。在这项工作中，我们提出了一个密集的卷积网络（DCN），并具有自我注意力，以增强时间域的语音。 DCN是具有跳过连接的基于编码器和解码器的体系结构。编码器和解码器中的每个层都包含一个密集的块和一个注意模块。密集的块和注意力模块有助于使用功能重用，增加网络深度和最大上下文聚集的组合来提取特征提取。此外，我们揭示了以前未知的问题，其基于增强语音的光谱幅度的损失。为了减轻这些问题，我们提出了基于增强语音和预测噪声的大幅度的新损失。即使提出的损失仅基于幅度，噪声预测施加的约束也确保损失增强了幅度和相位。实验结果表明，经过拟议的损失训练的DCN基本上优于其他因果和非因果语音增强的最先进方法。

Speech enhancement in the time domain is becoming increasingly popular in recent years, due to its capability to jointly enhance both the magnitude and the phase of speech. In this work, we propose a dense convolutional network (DCN) with self-attention for speech enhancement in the time domain. DCN is an encoder and decoder based architecture with skip connections. Each layer in the encoder and the decoder comprises a dense block and an attention module. Dense blocks and attention modules help in feature extraction using a combination of feature reuse, increased network depth, and maximum context aggregation. Furthermore, we reveal previously unknown problems with a loss based on the spectral magnitude of enhanced speech. To alleviate these problems, we propose a novel loss based on magnitudes of enhanced speech and a predicted noise. Even though the proposed loss is based on magnitudes only, a constraint imposed by noise prediction ensures that the loss enhances both magnitude and phase. Experimental results demonstrate that DCN trained with the proposed loss substantially outperforms other state-of-the-art approaches to causal and non-causal speech enhancement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题