基于卷积神经网络的语音和视觉信息的情感识别系统

论文标题

基于卷积神经网络的语音和视觉信息的情感识别系统

Emotion Recognition System from Speech and Visual Information based on Convolutional Neural Networks

论文作者

Ristea, Nicolae-Catalin, Dutu, Liviu Cristian, Radoi, Anamaria

论文摘要

情绪识别已成为人类互动域中研究的重要领域。该领域的最新进展表明，与单独使用单个信息源相比，将视觉与音频信息结合起来会带来更好的结果。从视觉的角度来看，可以通过分析人的面部表情来认识到人类的情感。更确切地说，可以通过几个面部动作单元的结合来描述人类的情感。在本文中，我们提出了一个能够基于深层卷积神经网络的精度和实时识别情绪的系统。为了提高识别系统的准确性，我们还分析了语音数据并融合来自两个来源的信息，即视觉和音频。实验结果表明，提出的情绪识别方案的有效性以及将视觉与音频数据相结合的重要性。

Emotion recognition has become an important field of research in the human-computer interactions domain. The latest advancements in the field show that combining visual with audio information lead to better results if compared to the case of using a single source of information separately. From a visual point of view, a human emotion can be recognized by analyzing the facial expression of the person. More precisely, the human emotion can be described through a combination of several Facial Action Units. In this paper, we propose a system that is able to recognize emotions with a high accuracy rate and in real time, based on deep Convolutional Neural Networks. In order to increase the accuracy of the recognition system, we analyze also the speech data and fuse the information coming from both sources, i.e., visual and audio. Experimental results show the effectiveness of the proposed scheme for emotion recognition and the importance of combining visual with audio data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题