CGCNN：原始语音的复杂Gabor卷积神经网络

论文标题

CGCNN：原始语音的复杂Gabor卷积神经网络

CGCNN: Complex Gabor Convolutional Neural Network on raw speech

论文作者

Noé, Paul-Gauthier, Parcollet, Titouan, Morchid, Mohamed

论文摘要

卷积神经网络（CNN）已用于自动语音识别（ASR）中，直接从原始信号中学习表示形式，而不是手工制作的声学特征，提供了更丰富，无损的输入信号。最近的研究建议，通过整合脉冲响应的形状，将先前的声学知识注入第一卷积层，以提高学到的声学模型的解释性及其性能。我们建议将复杂的Gabor滤波器与复杂值的深神经网络相结合，以替换常规的CNN权重核，以充分利用其最佳的时间频率分辨率和复杂域。对TIMIT音素识别任务进行的实验表明，所提出的方法可以达到顶级性能，同时保持可解释。

Convolutional Neural Networks (CNN) have been used in Automatic Speech Recognition (ASR) to learn representations directly from the raw signal instead of hand-crafted acoustic features, providing a richer and lossless input signal. Recent researches propose to inject prior acoustic knowledge to the first convolutional layer by integrating the shape of the impulse responses in order to increase both the interpretability of the learnt acoustic model, and its performances. We propose to combine the complex Gabor filter with complex-valued deep neural networks to replace usual CNN weights kernels, to fully take advantage of its optimal time-frequency resolution and of the complex domain. The conducted experiments on the TIMIT phoneme recognition task shows that the proposed approach reaches top-of-the-line performances while remaining interpretable.

下载PDF全文

下载文献需遵守相关版权规定

论文标题