探索基于回归的语音增强的深层混合张量 - 向量网络架构

论文标题

探索基于回归的语音增强的深层混合张量 - 向量网络架构

Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement

论文作者

Qi, Jun, Hu, Hu, Wang, Yannan, Yang, Chao-Han Huck, Siniscalchi, Sabato Marco, Lee, Chin-Hui

论文摘要

本文通过采用多种深度张量向量回归模型来增强语音，研究了模型参数数量和增强语音质量之间的不同权衡。我们发现，混合体系结构，即CNN-TT，能够通过降低模型参数大小来保持良好的质量性能。 CNN-TT由底部的几个卷积层组成，以提取特征，以提高顶部的语音质量和张量训练（TT）输出层，以减少模型参数。我们首先获得了基于卷积神经网络（CNN）向量向量回归模型的概括能力的新上限。然后，我们在爱丁堡嘈杂的语音语料库上提供了实验证据，以证明，在单渠道语音增强中，CNN以少量的模型尺寸为代价，以牺牲DNN优于DNN。此外，CNN-TT仅利用CNN模型参数的32 \％来略高于CNN的表现。此外，如果CNN-TT参数的数量增加到CNN模型大小的44％，则可以进一步提高性能。最后，我们对模拟嘈杂的WSJ0语料库进行多渠道语音增强的实验表明，我们提出的混合CNN-TT体系结构比DNN和CNN模型在更强大的语音质量和较小的参数尺寸方面取得了更好的结果。

This paper investigates different trade-offs between the number of model parameters and enhanced speech qualities by employing several deep tensor-to-vector regression models for speech enhancement. We find that a hybrid architecture, namely CNN-TT, is capable of maintaining a good quality performance with a reduced model parameter size. CNN-TT is composed of several convolutional layers at the bottom for feature extraction to improve speech quality and a tensor-train (TT) output layer on the top to reduce model parameters. We first derive a new upper bound on the generalization power of the convolutional neural network (CNN) based vector-to-vector regression models. Then, we provide experimental evidence on the Edinburgh noisy speech corpus to demonstrate that, in single-channel speech enhancement, CNN outperforms DNN at the expense of a small increment of model sizes. Besides, CNN-TT slightly outperforms the CNN counterpart by utilizing only 32\% of the CNN model parameters. Besides, further performance improvement can be attained if the number of CNN-TT parameters is increased to 44\% of the CNN model size. Finally, our experiments of multi-channel speech enhancement on a simulated noisy WSJ0 corpus demonstrate that our proposed hybrid CNN-TT architecture achieves better results than both DNN and CNN models in terms of better-enhanced speech qualities and smaller parameter sizes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题