论文标题
一个多任务和多任务分析框架,用于情感分析
A Multi-term and Multi-task Analyzing Framework for Affective Analysis in-the-wild
论文作者
论文摘要
人情感识别是人类计算机相互作用的重要因素。但是,使用野外数据的方法开发尚不足够准确,用于实际使用。在本文中,我们介绍了以价值(VA)和表达(EXP)为重点的情感识别方法,该方法已提交给了情感行为分析(ABAW)2020竞赛。由于我们认为情感行为具有许多具有自己的时间范围的可观察功能,因此我们在分析框架中介绍了多个优化的时间窗口(短期,中期和长期),以从视频数据中提取特征参数。此外,还使用了多种模态数据,包括动作单元,头姿势,凝视,姿势和重新网络50或有效的净功能,并在提取这些功能期间进行了优化。然后,我们为每个时间窗口生成了情感识别模型,并将这些模型结合在一起。此外,我们将价,唤醒和表达模型拼凑在一起,以实现多任务学习,考虑到面部表情背后的基本心理状态彼此紧密相关。在验证集中,我们的模型达到了0.498的价值评分,面部表达得分为0.471。这些验证结果表明,我们提出的框架可以有效提高估计准确性和鲁棒性。
Human affective recognition is an important factor in human-computer interaction. However, the method development with in-the-wild data is not yet accurate enough for practical usage. In this paper, we introduce the affective recognition method focusing on valence-arousal (VA) and expression (EXP) that was submitted to the Affective Behavior Analysis in-the-wild (ABAW) 2020 Contest. Since we considered that affective behaviors have many observable features that have their own time frames, we introduced multiple optimized time windows (short-term, middle-term, and long-term) into our analyzing framework for extracting feature parameters from video data. Moreover, multiple modality data are used, including action units, head poses, gaze, posture, and ResNet 50 or Efficient NET features, and are optimized during the extraction of these features. Then, we generated affective recognition models for each time window and ensembled these models together. Also, we fussed the valence, arousal, and expression models together to enable the multi-task learning, considering the fact that the basic psychological states behind facial expressions are closely related to each another. In the validation set, our model achieved a valence-arousal score of 0.498 and a facial expression score of 0.471. These verification results reveal that our proposed framework can improve estimation accuracy and robustness effectively.