FullSubnet+：带有复杂频谱图的频道注意fullsubnet，以增强语音

论文标题

FullSubnet+：带有复杂频谱图的频道注意fullsubnet，以增强语音

FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement

论文作者

Chen, Jun, Wang, Zilin, Tuo, Deyi, Wu, Zhiyong, Kang, Shiyin, Meng, Helen

论文摘要

先前提出的FullSubnet在深噪声抑制（DNS）挑战方面取得了出色的表现，并引起了很多关注。但是，它仍然遇到诸如输入输出不匹配和用于频段的粗略处理之类的问题。在本文中，我们提出了一个扩展的单渠道实时语音增强框架，称为FullSubnet+，并进行了重大改进。首先，我们设计了一个轻量级的多尺度时间敏感通道注意力（Mulca）模块，该模块采用了多尺度卷积和通道注意机制，以帮助网络专注于更具判别的频带以减少降噪。然后，为了充分利用嘈杂的语音中的相信息，我们的模型将所有范围，真实和虚构的光谱图作为输入。此外，通过用堆叠的时间卷积网络（TCN）块替换原始的全带模型中的长短期内存（LSTM）层，我们设计了一个更有效的全频段模块，称为全频段提取器。 DNS挑战数据集中的实验结果显示了我们的FullSubnet+的出色性能，该+达到了最先进的（SOTA）性能，并且表现优于其他现有的语音增强方法。

Previously proposed FullSubNet has achieved outstanding performance in Deep Noise Suppression (DNS) Challenge and attracted much attention. However, it still encounters issues such as input-output mismatch and coarse processing for frequency bands. In this paper, we propose an extended single-channel real-time speech enhancement framework called FullSubNet+ with following significant improvements. First, we design a lightweight multi-scale time sensitive channel attention (MulCA) module which adopts multi-scale convolution and channel attention mechanism to help the network focus on more discriminative frequency bands for noise reduction. Then, to make full use of the phase information in noisy speech, our model takes all the magnitude, real and imaginary spectrograms as inputs. Moreover, by replacing the long short-term memory (LSTM) layers in original full-band model with stacked temporal convolutional network (TCN) blocks, we design a more efficient full-band module called full-band extractor. The experimental results in DNS Challenge dataset show the superior performance of our FullSubNet+, which reaches the state-of-the-art (SOTA) performance and outperforms other existing speech enhancement approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题