言语情感识别的紧凑图形架构

论文标题

言语情感识别的紧凑图形架构

Compact Graph Architecture for Speech Emotion Recognition

论文作者

Shirian, A., Guha, T.

论文摘要

我们提出了一种深入的图形方法，以解决语音情感识别的任务。一种紧凑，高效且可扩展的表示数据的方式是图形的形式。遵循图形信号处理的理论，我们建议将语音信号作为周期图或线图建模。这种图形结构使我们能够构建一个基于图形的卷积网络（GCN）的体系结构，该体系结构可以执行与标准GCN中使用的近似卷积相比，可以执行准确的图形卷积。我们评估了在流行的IEMOCAP和MSP-IMPROV数据库中的语音情感识别模型的性能。我们的模型优于标准GCN和其他相关的深图架构，表明我们方法的有效性。与现有的语音情感识别方法相比，我们的模型可以达到与最先进的性能相当的性能，其可学习参数（〜30k）表示其在资源受限设备中的适用性。

We propose a deep graph approach to address the task of speech emotion recognition. A compact, efficient and scalable way to represent data is in the form of graphs. Following the theory of graph signal processing, we propose to model speech signal as a cycle graph or a line graph. Such graph structure enables us to construct a Graph Convolution Network (GCN)-based architecture that can perform an accurate graph convolution in contrast to the approximate convolution used in standard GCNs. We evaluated the performance of our model for speech emotion recognition on the popular IEMOCAP and MSP-IMPROV databases. Our model outperforms standard GCN and other relevant deep graph architectures indicating the effectiveness of our approach. When compared with existing speech emotion recognition methods, our model achieves comparable performance to the state-of-the-art with significantly fewer learnable parameters (~30K) indicating its applicability in resource-constrained devices.

下载PDF全文

下载文献需遵守相关版权规定

论文标题