论文标题
使用深度多任务学习和相关性损失的多模式连续情绪识别
Multimodal Continuous Emotion Recognition using Deep Multi-Task Learning with Correlation Loss
论文作者
论文摘要
在这项研究中,我们专注于使用身体运动和语音信号来估计激活,价和优势(AVD)属性的连续情绪识别。提出了半端到端网络体系结构,其中提取功能和原始信号都被喂食,并且使用多任务学习(MTL)而不是最新的单个任务学习(STL)训练该网络。此外,相关损耗,一致性相关系数(CCC)和Pearson相关系数(PCC)被用作训练期间的优化目标。实验是在Creativeit和Recola数据库上进行的,并使用CCC指标进行评估。为了强调MTL的效果,相关损失和多模式性,我们分别比较了MTL对STL的性能,CCC损失与均方根误差(MSE)损失(MSE)损失以及PCC损耗,对单个模态的多模式。我们观察到MTL训练对STL的绩效改善,尤其是为了估计价。此外,CCC损失在创意方面取得了7%以上的CCC提高,而Recola则在Recola中获得了13%的改善。
In this study, we focus on continuous emotion recognition using body motion and speech signals to estimate Activation, Valence, and Dominance (AVD) attributes. Semi-End-To-End network architecture is proposed where both extracted features and raw signals are fed, and this network is trained using multi-task learning (MTL) rather than the state-of-the-art single task learning (STL). Furthermore, correlation losses, Concordance Correlation Coefficient (CCC) and Pearson Correlation Coefficient (PCC), are used as an optimization objective during the training. Experiments are conducted on CreativeIT and RECOLA database, and evaluations are performed using the CCC metric. To highlight the effect of MTL, correlation losses and multi-modality, we respectively compare the performance of MTL against STL, CCC loss against root mean square error (MSE) loss and, PCC loss, multi-modality against single modality. We observe significant performance improvements with MTL training over STL, especially for estimation of the valence. Furthermore, the CCC loss achieves more than 7% CCC improvements on CreativeIT, and 13% improvements on RECOLA against MSE loss.