DESNET：用于同时言语编织，增强和分离的多通道网络

论文标题

DESNET：用于同时言语编织，增强和分离的多通道网络

DESNet: A Multi-channel Network for Simultaneous Speech Dereverberation, Enhancement and Separation

论文作者

Fu, Yihui, Wu, Jian, Hu, Yanxin, Xing, Mengtao, Xie, Lei

论文摘要

在本文中，我们提出了一个多渠道网络，用于同时语音覆盖，增强和分离（DESNET）。为了实现梯度繁殖和关节优化，我们采用了多通道特征的注意选择机制，该机制最初是在端到端不混合，固定光束形成和提取（E2E-UFE）结构中提出的。此外，新型的深层复杂卷积复发网络（DCCRN）被用作言语不混合的结构，基于神经网络的加权预测误差（WPE）被事先级联以进行语音消退。我们还介绍了网络培训的阶级SNR策略和交响乐损失，以进一步提高最终性能。实验表明，在非分类情况下，所提出的DESNET在语音增强和分离中优于DCCRN和大多数最先进的结构，而在过渡的场景中，Desnet也显示出对层叠的WPE-WPE-DCCRN网络的改进。

In this paper, we propose a multi-channel network for simultaneous speech dereverberation, enhancement and separation (DESNet). To enable gradient propagation and joint optimization, we adopt the attentional selection mechanism of the multi-channel features, which is originally proposed in end-to-end unmixing, fixed-beamforming and extraction (E2E-UFE) structure. Furthermore, the novel deep complex convolutional recurrent network (DCCRN) is used as the structure of the speech unmixing and the neural network based weighted prediction error (WPE) is cascaded beforehand for speech dereverberation. We also introduce the staged SNR strategy and symphonic loss for the training of the network to further improve the final performance. Experiments show that in non-dereverberated case, the proposed DESNet outperforms DCCRN and most state-of-the-art structures in speech enhancement and separation, while in dereverberated scenario, DESNet also shows improvements over the cascaded WPE-DCCRN networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题